Home

Awesome

<p align="center"> <a href="https://github.com/logpai"> <img src="https://cdn.jsdelivr.net/gh/logpai/logpai.github.io@master/img/logpai_logo.jpg" width="480"></a></p>

Loghub

<div> <a href="https://github.com/logpai/loghub/stargazers"><img src="http://bytecrank.com/nastyox/reporoster/php/stargazersSVG.php?user=logpai&repo=loghub" width="600"/><a/> </div>

Loghub maintains a collection of system logs, which are freely accessible for AI-driven log analytics research. Some of the logs are production data released from previous studies, while some others are collected from real systems in our lab environment. Wherever possible, the logs are NOT sanitized, anonymized or modified in any way. These log datasets are freely available for research or academic work.

🤗 We proudly announce that the loghub datasets have attained total <a href="https://doi.org/10.5281/zenodo.1144100"><img src="https://img.shields.io/endpoint?&url=https://cdn.jsdelivr.net/gh/logpai/loghub@zenodo/downloads.json&labelColor=1AE&color=DDEEFF&style=flat&label=Downloads"></a> by more than 450 organizations from both industry and academia.

Logs currently available

🔗 Get raw logs via hyperlinks in the Download column.

DatasetDescriptionLabeledTime Span#LinesRaw SizeDownload
<tr><th colspan=7 align="center">:open_file_folder: Distributed systems</th></tr>
HDFS_v1Hadoop distributed file system log:heavy_check_mark:38.7 hours11,175,6291.47GB:link:
HDFS_v2Hadoop distributed file system logN.A.71,118,07316.06GB:link:
HDFS_v3Instrumented HDFS trace log (TraceBench):heavy_check_mark:N.A.14,778,0792.96GB:link:
HadoopHadoop mapreduce job log:heavy_check_mark:N.A.394,30848.61MB:link:
SparkSpark job logN.A.33,236,6042.75GB:link:
ZookeeperZooKeeper service log26.7 days74,3809.95MB:link:
OpenStackOpenStack infrastructure log:heavy_check_mark:N.A.207,82058.61MB:link:
<tr><th colspan=7 align="center">:open_file_folder: Super computers</th></tr>
BGLBlue Gene/L supercomputer log:heavy_check_mark:214.7 days4,747,963708.76MB:link:
HPCHigh performance cluster logN.A.433,48932.00MB:link:
ThunderbirdThunderbird supercomputer log:heavy_check_mark:244 days211,212,19229.60GB:link:
<tr><th colspan=7 align="center">:open_file_folder: Operating systems</th></tr>
WindowsWindows event log226.7 days114,608,38826.09GB:link:
LinuxLinux system log263.9 days25,5672.25MB:link:
MacMac OS log7.0 days117,28316.09MB:link:
<tr><th colspan=7 align="center">:open_file_folder: Mobile systems</th></tr>
Android_v1Android framework logN.A.1,555,005183.37MB:link:
Android_v2Android framework logN.A.30,348,0423.38GB:link:
HealthAppHealth app log10.5 days253,39522.44MB:link:
<tr><th colspan=7 align="center">:open_file_folder: Server applications</th></tr>
ApacheApache web server error log263.9 days56,4814.90MB:link:
OpenSSHOpenSSH server log28.4 days655,14670.02MB:link:
<tr><th colspan=7 align="center">:open_file_folder: Standalone software</th></tr>
ProxifierProxifier software logN.A.21,3292.42MB:link:

🔥 Citation

Please cite the following paper if you use the loghub datasets in your research.

Publications using loghub datasets

PublicationPaper Title
DSN'07Adam J. Oliner, Jon Stearley. What Supercomputers Say: A Study of Five System Logs. IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 2007.
SOSP'09Wei Xu, Ling Huang, Armando Fox, David A. Patterson, Michael I. Jordan. Detecting Large-Scale System Problems by Mining Console Logs. ACM Symposium on Operating Systems Principles (SOSP), 2009.
KDD'09Adetokunbo Makanju, A. Nur Zincir-Heywood, Evangelos E. Milios. Clustering Event Logs Using Iterative Partitioning. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2009.
ISSRE'16Shilin He, Jieming Zhu, Pinjia He, Michael R. Lyu. Experience Report: System Log Analysis for Anomaly Detection. IEEE International Symposium on Software Reliability Engineering (ISSRE), 2016.
DSN'16Pinjia He, Jieming Zhu, Shilin He, Jian Li, Michael R. Lyu. An Evaluation Study on Log Parsing and Its Use in Log Mining. IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 2016.
ICSE'16Qingwei Lin, Hongyu Zhang, Jian-Guang Lou, Yu Zhang, Xuewei Chen. Log Clustering Based Problem Identification for Online Service Systems. International Conference on Software Engineering (ICSE), 2016.
ICWS'17Pinjia He, Jieming Zhu, Zibin Zheng, Michael R. Lyu. Drain: An Online Log Parsing Approach with Fixed Depth Tree. IEEE International Conference on Web Services (ICWS), 2017.
CCS'17Min Du, Feifei Li, Guineng Zheng, Vivek Srikumar. DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning. ACM Conference on Computer and Communications Security (CCS), 2017.
TDSC'18Pinjia He, Jieming Zhu, Shilin He, Jian Li, Michael R. Lyu. Towards Automated Log Parsing for Large-Scale Log Data Analysis. IEEE Transactions on Dependable and Secure Computing (TDSC), 2018.
TKDE'18Min Du, Feifei Li. Spell: Online Streaming Parsing of Large Unstructured System Logs. IEEE Transactions on Knowledge and Data Engineering (TKDE), 2018.
ASE'19Jinyang Liu, Jieming Zhu, Shilin He, Pinjia He, Zibin Zheng, Michael R. Lyu. Logzip: Extracting Hidden Structures via Iterative Clustering for Log Compression. IEEE/ACM International Conference on Automated Software Engineering (ASE), 2019.
ICSE'19Jieming Zhu, Shilin He, Jinyang Liu, Pinjia He, Qi Xie, Zibin Zheng, Michael R. Lyu. Tools and Benchmarks for Automated Log Parsing. International Conference on Software Engineering (ICSE), 2019.
ICSE'22Zanis Ali Khan, Donghwan Shin, Domenico Bianculli, Lionel Briand. Guidelines for Assessing the Accuracy of Log Message Template Identification Techniques. International Conference on Software Engineering (ICSE), 2023.
ICSE'23Van-Hoang Le, Hongyu Zhang. Log Parsing with Prompt-based Few-shot Learning. International Conference on Software Engineering (ICSE), 2023.
ICSE'23Zhenhao Li, Chuan Luo, Tse-Hsun Chen, Weiyi Shang, Shilin He, Qingwei Lin, Dongmei Zhang. Did We Miss Something Important? Studying and Exploring Variable-Aware Log Abstraction. International Conference on Software Engineering (ICSE), 2023.
ICSE'23Yintong Huo, Yuxin Su, Cheryl Lee, Michael R. Lyu. SemParser: A Semantic Parser for Log Analysis. International Conference on Software Engineering (ICSE), 2023.
WWW'23Liming Wang, Hong Xie, Ye Li, Jian Tan, John C.S. Lui. Interactive Log Parsing via Light-weight User Feedback. ACM Web Conference, 2023.
TSC'23Siyu Yu, Pinjia He, Ningjiang Chen, Yifan Wu. Brain: Log Parsing with Bidirectional Parallel Tree. IEEE Transaction on Severice Computing, 2023.

:bulb: If you use loghub datasets in your paper, please feel free to make a PR to add your paper to the table.

Discussion

Welcome to join our WeChat group for any question and discussion. Alternatively, you can open a discussion here.

Scan QR code

🌈 License

The datasets are freely available for research or academic work. For any usage or distribution of the datasets, please refer to the loghub repository URL https://github.com/logpai/loghub and cite the loghub paper where applicable.