Home

Awesome

COVID-19 Database for Research and Analysis

build tests docs

This repository contains tools to generate a COVID-19 database for research and analysis, and links to a pre-generated database. The database is a self-contained Sqlite database which can be used on any platform.

The program in this library can be run on your machine to download data from the Internet and assemble your own database. The process takes approximately two minutes and you can run it however often you like to obtain the latest data. Alternatively, a database is generated daily that you can download as well.

Download the database

You can download a compressed database for yourself here: covid19db.zip.

This file is automatically regenerated daily.

Example uses

This data is used in the COVID-19 in Kansas project. It has graphs automatically updated daily with a unique perspective on various data.

Using the data

Besides the Sqlite command-line tools, here are some other tips on using the data:

Please note that various included data requests or requires attribution. Please give credit to original sources of data (eg, The New York Times) and aggregators in your work.

Included data and sources

You can find a complete database schema in dbschema.rs. The views defined there are intended to be the primary way to access the database. A Rust API for sqlx is also provided for select tables. Direct source data download URLs are in loader.rs.

Here are the sources:

Additional Resources

These are potential future integrations:

Building your own database

A command like this should do it

git clone https://github.com/jgoerzen/covid19db
cd covid19db
cargo run --release

You will then get a file named covid19.db in the working directory. Just use this with Sqlite.

With these commands, you can verify these results for yourself. If you don't already have Rust installed, see the Rust installation page.

The Rust library

It is pretty skeletal at the moment, but you can browse the docs.

Database and API stability

This is a rapidly-changing field and the data providers change their schemas on a fairly frequent basis. I attempt to mitigate impacts. If you avoid things like SELECT * and instead name your columns explicitly you will minimize the impact on yourself in the event of API changes.

Users

This data is used by the Kansas COVID-19 Charts project and perhaps others.

Copyright and Acknowledgments

This code is Copyright (c) 2019-2020 John Goerzen

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program.  If not, see <http://www.gnu.org/licenses/>.

This repository contains only tools for obtaining data and no data itself, though the data itself may be available elsewhere on Github. If you use the data accumulated by this program, or download it, you may be required to acknowledge the source. Here are some details:

cdataset - New York Times

In general, we are making this data publicly available for broad, noncommercial public use including by medical and public health researchers, policymakers, analysts and local news media.

If you use this data, you must attribute it to “The New York Times” in any publication. If you would like a more expanded description of the data, you could say “Data from The New York Times, based on reports from state and local health agencies.”

If you use it in an online presentation, we would appreciate it if you would link to our U.S. tracking page at https://www.nytimes.com/interactive/2020/us/coronavirus-us-cases.html.

If you use this data, please let us know at covid-data@nytimes.com.

See our LICENSE for the full terms of use for this data.

This license is co-extensive with the Creative Commons Attribution-NonCommercial 4.0 International license, and licensees should refer to that license (CC BY-NC) if they have questions about the scope of the license.

source

cdataset and loc_lookup - Johns Hopkins

  1. This data set is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) by the Johns Hopkins University on behalf of its Center for Systems Science in Engineering. Copyright Johns Hopkins University 2020.
  2. Attribute the data as the "COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University" or "JHU CSSE COVID-19 Data" for short, and the url: https://github.com/CSSEGISandData/COVID-19.
  3. For publications that use the data, please cite the following publication: "Dong E, Du H, Gardner L. An interactive web-based dashboard to track COVID-19 in real time. Lancet Inf Dis. 20(5):533-534. doi: 10.1016/S1473-3099(20)30120-1"

source

rtlive - rt.live

We just ask that you cite Rt.live as the source and link where appropriate.

source

covidtracking - COVID-19 Tracking Project

You are welcome to copy, distribute, and develop data and website content from The COVID Tracking Project at The Atlantic for all healthcare, medical, journalistic and non-commercial uses, including any personal, editorial, academic, or research purposes.

The COVID Tracking Project at The Atlantic data and website content is published under a Creative Commons CC BY-NC-4.0 license, which requires users to attribute the source and license type (CC BY-NC-4.0) when sharing our data or website content. The COVID Tracking Project at The Atlantic also grants permission for any derivative use of this data and website content that supports healthcare or medical research (including institutional use by public health and for-profit organizations), or journalistic usage (by nonprofit or for-profit organizations). All other commercial uses are not permitted under the Creative Commons license, and will require permission from The COVID Tracking Project at The Atlantic.

source

owid - Our World In Data

"All our research and visualizations are free to use by everyone for all purposes." source

Visualizations and text: All our charts, maps, and text is licensed under a very permissive ‘Creative Commons’ (CC) license: The CC-BY license. The BY stands for ‘by attribution’ and this means you are free to take whatever is useful for your work. You just need to provide credit to Our World in Data and our underlying sources (see below).

source

Harvey County Testing Data

This data is a manual import from the Kansas Department of Health and Environment and the Harvey County Health Department.