Skip to content

logpai/loghub

Repository files navigation

Loghub

Loghub maintains a collection of system logs, which are freely accessible for AI-driven log analytics research. Some of the logs are production data released from previous studies, while some others are collected from real systems in our lab environment. Wherever possible, the logs are NOT sanitized, anonymized or modified in any way. These log datasets are freely available for research or academic work.

πŸ€— We proudly announce that the loghub datasets have attained total by more than 450 organizations from both industry and academia.

Logs currently available

πŸ”— Get raw logs via hyperlinks in the Download column.

Dataset Description Labeled Time Span #Lines Raw Size Download
πŸ“‚ Distributed systems
HDFS_v1 Hadoop distributed file system log βœ”οΈ 38.7 hours 11,175,629 1.47GiB πŸ”—
HDFS_v2 Hadoop distributed file system log N.A. 71,118,073 16.06GiB πŸ”—
HDFS_v3 Instrumented HDFS trace log (TraceBench) βœ”οΈ N.A. 14,778,079 2.96GiB πŸ”—
Hadoop Hadoop mapreduce job log βœ”οΈ (Check #56) N.A. 394,308 48.61MiB πŸ”—
Spark Spark job log N.A. 33,236,604 2.75GiB πŸ”—
Zookeeper ZooKeeper service log 26.7 days 74,380 9.95MiB πŸ”—
OpenStack OpenStack infrastructure log βœ”οΈ N.A. 207,820 58.61MiB πŸ”—
πŸ“‚ Super computers
BGL Blue Gene/L supercomputer log βœ”οΈ 214.7 days 4,747,963 708.76MiB πŸ”—
HPC High performance cluster log N.A. 433,489 32.00MiB πŸ”—
Thunderbird Thunderbird supercomputer log βœ”οΈ 244 days 211,212,192 29.60GiB πŸ”—
πŸ“‚ Operating systems
Windows Windows event log 226.7 days 114,608,388 26.09GiB πŸ”—
Linux Linux system log 263.9 days 25,567 2.25MiB πŸ”—
Mac Mac OS log 7.0 days 117,283 16.09MiB πŸ”—
πŸ“‚ Mobile systems
Android_v1 Android framework log N.A. 1,555,005 183.37MiB πŸ”—
Android_v2 Android framework log N.A. 30,348,042 3.38GiB πŸ”—
HealthApp Health app log 10.5 days 253,395 22.44MiB πŸ”—
πŸ“‚ Server applications
Apache Apache web server error log 263.9 days 56,481 4.90MiB πŸ”—
OpenSSH OpenSSH server log 28.4 days 655,146 70.02MiB πŸ”—
πŸ“‚ Standalone software
Proxifier Proxifier software log N.A. 21,329 2.42MiB πŸ”—

πŸ”₯ Citation

Please cite the following two papers if you use the loghub datasets in your research.

🌈 License

The datasets are freely available for research or academic work. For any usage or distribution of the datasets, please refer to the loghub repository URL https://github.com/logpai/loghub and cite the loghub paper where applicable.

πŸ™‹ Discussion

Welcome to open a discussion here for any question and discussion.

About

A large collection of system log datasets for AI-driven log analytics [ISSRE'23]

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published