Home

Awesome

Dataset in LibCity

This repository is used to introduce the dataset in LibCity.

Dataset Conversion Tools

The dataset used in LibCity is stored in a unified data storage format named atomic files. In order to directly use the datasets we collected in LibCity, we have converted all datasets into the format of atomic files, and provide the conversion tools in this repository.

All conversion tools take the original dataset in the ./input/ directory as input, and output the converted atomic files to the ./output/ directory. In addition, we provide a link to obtain the original dataset in the first line of each conversion tool. You can download the original dataset through this link and place it in the ./input/ directory. Imitating our conversion tools, you can easily convert your own traffic dataset to adapt it to LibCity.

Besides, you can simply download the datasets we have processed, the data link is BaiduDisk with code 1231 or Google Drive.

Dataset Statistics Infomation

Here we present the statistics of the datasets we have processed.

Traffic State Datasets-Point-based Flow or Speed or Occupancy

Collected from sensors or Pre-processed from trajectory data.

DATASET#GEO#REL#USR#DYNAPLACEDURATIONINTERVAL
METR_LA20711,7537,094,304Los Angeles, USAMar. 1, 2012 - Jun. 27, 20125min
LOS_LOOP20742,8497,094,304Los Angeles, USAMar. 1, 2012 - Jun. 27, 20125min
LOS_LOOP_SMALL20742,849417,312Los Angeles, USAMay. 1, 2012 - May. 5, 20125min
SZ_TAXI15624,336464,256Shenzhen, ChinaJan. 1, 2015 - Jan. 31, 201515min
LOOP_SEATTLE323104,32933,953,760Greater Seattle Area, USAover the entirely of 20155min
Q_TRAFFIC45,14863,422264,386,688Beijing, ChinaApr. 1, 2017 - May 31, 201715min
PEMSD33585479,382,464California, USASept. 1, 2018 - Nov. 30, 20185min
PEMSD43073405,216,544San Francisco Bay Area, USAJan. 1, 2018 - Feb. 28, 20185min
PEMSD788386624,921,792California, USAMay. 1, 2017 - Aug. 31, 20175min
PEMSD81702773,035,520San Bernardino Area, USAJul. 1, 2016 - Aug. 31, 20165min
PEMSD7(M)22851,9842,889,216California, USAweekdays of May and June, 20125min
PEMS_BAY3258,35816,937,700San Francisco Bay Area, USAJan. 1, 2017 - Jun. 30, 20175min
BEIJING_SUBWAY27676,176248,400Beijing, ChinaFeb. 29, 2016 - Apr. 3, 201630min
M_DENSE30525,600Madrid, SpainJan. 1, 2018 - Dec. 21, 201960min
ROTTERDAM2084,813,536Rotterdam, Holland135 days of 20182min
SHMETRO28882,9441,934,208Shanghai, ChinaJul. 1, 2016 - Sept. 30, 201615min
HZMETRO806,400146,000Hangzhou, ChinaJan. 1, 2019 - Jan. 25, 201915min
NYCTAXI202001-202003_DYNA26369,169574,392New York, USAJan. 1, 2020 - Mar. 30, 202060min

Traffic State Datasets-Grid-based In-Flow and Out-Flow

Pre-processed from trajectory data.

DATASET#GEO#REL#USR#DYNAPLACEDURATIONINTERVAL
TAXIBJ32*325,652,480Beijing, ChinaMar. 1, 2015 - Jun. 30, 2015 et al.30min
T_DRIVE2015020632*321,048,5763,686,400Beijing, ChinaFeb. 1, 2015 - Jun. 30, 201560min
T_DRIVE_SMALL32*32172,032Beijing, ChinaFeb. 2, 2008 - Feb. 8, 200860min
NYCTAXI201401-201403_GRID10*20432,000New York, USAJan. 1, 2014 - Mar. 31, 201460min
NYCBIKE202007-20200910*20441,600New York, USAJul. 1, 2020 - Sept. 30, 202060min
PORTO201307-20130920*10441,600Porto, PortugalJul. 1, 2013 - Sept. 30, 201360min
AUSTINRIDE20160701-2016093016*8282,624Austin, USAJul. 1, 2016 - Sept. 30, 201660min
BIKEDC202007-20200916*8282,624Washington, USAJul. 1, 2020 - Sept. 30, 202060min
BIKECHI202007-202009-360015*18596,160Chicago, USAJul. 1, 2020 - Sept. 30, 202060min
BIKECHI202007-20200915*181,192,320Chicago, USAJul. 1, 2020 - Sept. 30, 202030min
NYCTaxi2014011215*51,314,000New York, USAJan. 1, 2014 - Dec. 31, 201430min
NYCTaxi2015010310*20576,000New York, USAJan. 1, 2015 - Mar. 1, 201530min
NYCTaxi2016010216*12552,960New York, USAJan. 1, 2016 - Feb. 29, 201630min
NYCBike2014040916*8562,176New York, USAApr. 1, 2014 - Sept. 30, 201460min
NYCBike2016070810*20576,000New York, USAJul. 1, 2016 - Aug. 29, 201630min
NYCBike2016080914*8322,560New York, USAAug. 1, 2016 - Sept. 29, 201630min

Traffic State Datasets-OD-based Flow

DATASET#GEO#REL#USR#DYNAPLACEDURATIONINTERVAL
NYCTAXI202004-202006_OD26369,169150,995,927New York, USAApr. 1, 2020 - Jun. 30, 202060min

Traffic State Datasets-Grid-OD-based Flow

DATASET#GEO#REL#USR#DYNAPLACEDURATIONINTERVAL
NYC_TOD15*598,550,000New York, USA

Traffic State Datasets-Risk

DATASET#GEO#REL#USR#DYNAPLACEDURATIONINTERVAL
NYC_RISK243590493504000New York, USAJan. 01, 2013 - Dec. 31, 201360min
CHICAGO_RISK197388092332800Chicago, USAFeb. 01, 2016 - Sep. 30, 201660min

GPS Point Trajectory Datasets

DATASET#GEO#REL#USR#DYNAPLACEDURATIONINTERVAL
Chengdu_Taxi_Sample14565712360Chengdu, ChinaAug. 03, 2014 - Aug. 30, 2014
Beijing_Taxi_Sample1638476518424Beijing, ChinaOct. 01, 2013 - Oct. 31, 2013
Seattle61364585740617531Seattle WA, USAJan.17,2009 20:27:37 - 22:34:281s
Global110451819612502Neftekamsk, Republic of Bashkortostan, Russian Federation1s

Road Segment-based Trajectory Datasets

DATASET#GEO#REL#USR#DYNAPLACEDURATIONINTERVAL

POI-based Trajectory Datasets

DATASET#GEO#REL#USR#DYNAPLACEDURATIONINTERVAL
Foursquare_TKY61,8582,293573,703Tokyo, JapanApr. 4, 2012 - Feb. 16, 2013
Foursquare_NYC38,3331,083227,428New York, USAApr. 3, 2012 - Feb. 15, 2013
Gowalla1,280,969913,660107,0926,442,892GlobalFeb. 4, 2009 - Oct. 23, 2010
BrightKite772,966394,33451,4064,747,287GlobalMar. 21, 2008 - Oct. 18, 2010
Instagram13,18778,2332,205,794New York, USAJun. 15, 2011 - Nov. 8, 2016

Road Network Datasets

DATASET#GEO#REL#USR#DYNAPLACEDURATIONINTERVAL
bj_roadmap_edge3802795660Beijing, China
bj_roadmap_node1692738027Beijing, China

Note:

Cite

Our paper is accepted by ACM SIGSPATIAL 2021. If you find LibCity useful for your research or development, please cite our paper.

@inproceedings{10.1145/3474717.3483923,
  author = {Wang, Jingyuan and Jiang, Jiawei and Jiang, Wenjun and Li, Chao and Zhao, Wayne Xin},
  title = {LibCity: An Open Library for Traffic Prediction},
  year = {2021},
  isbn = {9781450386647},
  publisher = {Association for Computing Machinery},
  address = {New York, NY, USA},
  url = {https://doi.org/10.1145/3474717.3483923},
  doi = {10.1145/3474717.3483923},
  booktitle = {Proceedings of the 29th International Conference on Advances in Geographic Information Systems},
  pages = {145–148},
  numpages = {4},
  keywords = {Spatial-temporal System, Reproducibility, Traffic Prediction},
  location = {Beijing, China},
  series = {SIGSPATIAL '21}
}
Jingyuan Wang, Jiawei Jiang, Wenjun Jiang, Chao Li, and Wayne Xin Zhao. 2021. LibCity: An Open Library for Traffic Prediction. In Proceedings of the 29th International Conference on Advances in Geographic Information Systems (SIGSPATIAL '21). Association for Computing Machinery, New York, NY, USA, 145–148. DOI:https://doi.org/10.1145/3474717.3483923

04/27/2023 Update: We published a long paper on LibCity, including (1) classification and base units of urban spatial-temporal data and proposed a unified storage format, i.e., atomic files, (2) a detailed review of urban spatial-temporal prediction field (including macro-group prediction, micro-individual prediction, and fundamental tasks), (3) proposed LibCity, an open source library for urban spatial-temporal prediction, detailing each module and use cases, and providing a web-based experiment management and visualization platform, (4) selected more than 20 models and datasets for comparison experiments based on LibCity, obtained model performance rankings and summarized promising future research directions. Please check this link for more details.

For the long paper, please cite it as follows:

@article{libcitylong,
  title={Towards Efficient and Comprehensive Urban Spatial-Temporal Prediction: A Unified Library and Performance Benchmark}, 
  author={Jingyuan Wang and Jiawei Jiang and Wenjun Jiang and Chengkai Han and Wayne Xin Zhao},
  journal={arXiv preprint arXiv:2304.14343},
  year={2023}
}