Wei Xu ( 徐葳 )Assistant Professor and Assistant Dean Institute for Interdisciplinary Information Sciences Tsinghua University |
SOSP 2009 Log Dataset
This page describes the dataset and demo used in the following paper.
Large-scale system problem detection by mining console
logs
Wei Xu, Ling Huang, Armando Fox, David Patterson, and Michael
Jordan
In Proc. of the 22nd ACM Symposium on Operating Systems Principles
(SOSP’ 09), Big Sky, MT, October 2009 [pdf]
We generated the dataset in a private cloud environment using benchmark workload, and thus there is no privacy information involved in the dataset. The dataset is provided as-is in standard BSD license. If you use the dataset in academic publications, please cite the paper above.
Dataset and Demo
I have the demo file here.
http://iiis.tsinghua.edu.cn/~weixu/demobuild.zip
Inside this zip file, there is a data/online1/lg/sorted.log.gz It isn't the original log, but it is almost the same (I just sorted it by time stamp, no other changes I believe).
You can directly run a demo UI. It is the demo.jar in the zip file. (on Windows machines if you have JRE correctly installed, you should be able to run it just by double clicking the jar file). The demo is intended to provide some sense on how the online detection algorithm works (describved in the following ICDM paper).
Online system problem detection by mining patterns of
console logs
Wei Xu, Ling Huang, Armando Fox, David Patterson, and Michael Jordan
In Proc. of the IEEE International Conference on Data Mining
(ICDM’ 09), Miami, FL, December 2009[pdf]
I do have some labeled data that might help. They are at
http://iiis.tsinghua.edu.cn/~weixu/200nodes.rar
Code
We have all our code at https://github.com/xuw/logm. However, as the code is very old, we cannot provide any additional technical support.