Home

Awesome

NUS SMS Corpus

Due to some technicial problems, the NUS SMS Corpus website http://wing.comp.nus.edu.sg/SMSCorpus is temporally unavailable. For your convenience, we upload the most recent release (Mar 9, 2015) of the corpus here.

Please cite the following paper if you use our corpus. Thanks!

Tao Chen and Min-Yen Kan (2013). Creating a Live, Public Short Message Service Corpus: The NUS SMS Corpus. Language Resources and Evaluation, 47(2)(2013), pages 299-355.

Please do us a favor and send a quick message to Tao Chen (chentaokite @ gmail dot com), if download this corpus and plan on using it. It will only take a minute of your time and will help us get a better idea of what such a corpus might be used for.

LanguageFile FormatSizeNumber of Messages
EnglishSQL2,045K55,835
EnglishXML2,359K55,835
EnglishJSON2,740K55,835
ChineseSQL979K31,465
ChineseXML1,182K31,465
ChineseJSON1,700K31,465

Our dataset has been added to Kaggle! Please consider participating a competition!

Group Members