Mining console logs for problem detection

When a datacenter-scale service consisting of hundreds of software components running on thousands of computers misbehaves, developer-operators need every tool at their disposal to troubleshoot and diagnose operational problems. Ironically, there is one source of information that is built into almost every piece of software that provides detailed information that reflects the original developers’ ideas about noteworthy or unusual events, but is typically ignored: the humble console log.  

We propose a fully automatic general methodology for mining console logs using a combination of program analysis, information retrieval, data mining and machine learning techniques.  We use source code analysis to understand the structures from the console logs. We then extract features, such as execution traces, from logs and use machine learning methods to detect problems. We also use a decision tree to distill the detection results to a format readily understandable by domain experts (e.g., developers, integrators and operators) who need not be familiar with the anomaly detection algorithms. The whole process requires no human intervention.

We demonstrated our methodology with two real world systems, Hadoop and Darkstar Online Game Server, and we discovered interesting problems on each system.

This technique is especially useful in cloud computing environment, where innovative developers can quickly build an application from open source components, and even VM appliances.  As console logs are built into almost every piece of software today, using out technique can help the developer achieve fine-grained monitoring of these building blocks, while not adding any overhead to maintain a custom instrumentation.

Methodology Overview


Publications

Online system problem detection by mining patterns of console logs
Wei Xu, Ling Huang, Armando Fox, David Patterson, and Michael Jordan
To appear in the IEEE International Conference on Data Mining (ICDM’ 09), Miami, FL, December 2009[pdf]

Large-scale system problem detection by mining console logs
Wei Xu, Ling Huang, Armando Fox, David Patterson, and Michael Jordan
In Proceedings of the 22nd ACM Symposium on Operating Systems Principles (SOSP’ 09), Big Sky, MT, October 2009 [pdf] [slides] [video]

Mining console logs for large-scale system problem detection
Wei Xu, Ling Huang, Armando Fox, David Patterson, and Michael Jordan
In Proc. of the 3rd workshop on Tackling Computer Systems Problems with Machine Learning Techniques (SysML’08), San Diego, CA, December 2008 [pdf]

Students