Wei Xu ( 徐葳 )

Assistant Professor and Assistant Dean

Institute for Interdisciplinary Information Sciences

Tsinghua University

email_no_spam

WeiXu

SOSP 2009 Log Dataset

This page describes the dataset and demo used in the following paper.

Large-scale system problem detection by mining console logs
Wei Xu, Ling Huang, Armando Fox, David Patterson, and Michael Jordan
In Proc. of the 22nd ACM Symposium on Operating Systems Principles (SOSP’ 09), Big Sky, MT, October 2009 [pdf]

We generated the dataset in a private cloud environment using benchmark workload, and thus there is no privacy information involved in the dataset. The dataset is provided as-is in standard BSD license. If you use the dataset in academic publications, please cite the paper above.

Dataset and Demo

I have the demo file here.

http://iiis.tsinghua.edu.cn/~weixu/demobuild.zip

Inside this zip file, there is a data/online1/lg/sorted.log.gz It isn't the original log, but it is almost the same (I just sorted it by time stamp, no other changes I believe).

You can directly run a demo UI. It is the demo.jar in the zip file. (on Windows machines if you have JRE correctly installed, you should be able to run it just by double clicking the jar file). The demo is intended to provide some sense on how the online detection algorithm works (describved in the following ICDM paper).

Online system problem detection by mining patterns of console logs
Wei Xu, Ling Huang, Armando Fox, David Patterson, and Michael Jordan
In Proc. of the IEEE International Conference on Data Mining (ICDM’ 09), Miami, FL, December 2009[pdf]

I do have some labeled data that might help. They are at

http://iiis.tsinghua.edu.cn/~weixu/200nodes.rar

  • rawTFVector.txt is the Message count vector matrix
  • col_header.txt is which log message each column represents
  • nameIndex.txt is which block each line represents
  • mlabel.txt has the manual label (see the first column only, the second the column is an attempt to explain what might be wrong (but I cannot find the actual text description now). Each line corresponds to a line in rawTFVector.txt.

    Code

    We have all our code at https://github.com/xuw/logm. However, as the code is very old, we cannot provide any additional technical support.