Awesome
covid19-eu-data
covid19-eu-data
is a dataset repository for COVID-19/SARS-CoV-2 cases in Europe. We pull data from official government websites regularly using the open-source scripts inside the repository.
Changelog
On 2022-02-29, IE stopped updating the detailed covid infection data for weekends.
On 2021-11-01, we stopped collecting PL data as there are some data quality issues. But we update the original data here.
Breaking Change:
On 2021-01-03, we dropped the whole commit history and removed the cache files. This is done because the repo is growing into a behemoth.
On 2020-12-31, we stopped caching most of the webpages due to oversize of the repo.
On 2020-05-22, we removed documents/be
and documents/dk
. These two folders are bloating and our repo reached the GitHub storage hard limit (2GB). The files have been moved to covid19-eu-zh/covid19-eu-data-20200522 as a snapshot.
Full changelog: CHANGELOG.md
Update Status
Commit Status:
Workflow status by countries:
Country | Status | Data Source |
---|---|---|
AT | ||
BE | ||
CH | ||
CZ | ||
DE | ||
DK | ||
ES | ||
FR | ||
GR | ||
HU | ||
IE | ||
IT | ||
NL | ||
NO | ||
PL | ||
PT | ||
SE | ||
FI | ||
SI | ||
UK | ||
EU(ECDC) |
Dataset
Tabular Data
The tabular data files are located in dataset
folder. The folder dataset/daily
holds the daily updates in each country.
The metadata for the tabular data is found in
.dataherb/metadata.yml
.
Other Data
Some of the countries publish more than simple tabular data. We cache the files in documents
folder.
Scrapers
The scripts that are being used to update the data are located in scripts
folder. Most of the scripts require the utils.py
module to run. Create a new environment and run pip install -r requirements.txt
to install the requirements.
Workflows
The workflows that update the dataset are defined in .github/workflows
. The python scripts are scheduled to run on GitHub Actions.
Notes
AT
Caveats:
- We started tracking the recovered population and the deaths on 2020-03-13.
BE
- Only PDF files of the records are downloaded.
CH
- CH hospitalized indicates the current hospitalized patients.
DE
- For technical reasons, no data was transmitted from Hamburg on March 25th, 2020.
There is a repo cleaning up the raw data on ArcGis.
FR
- France stopped updating the case tables on the webpage on 2020-03-26. We went back to the PDF files.
NL
Caveats:
- NL doesn't publish the time of the data release. We use 00:00 of the day to denote the release time though it doesn't indicate the actual update time.
UK
We stopped tracking UK data.
- UK is already publishing data in an easy-to-use format. Click here for the full data
- There is already a very good github repo cleaning up the data. Click here for the repo.
Scotland
- Starting from 2020-04-08, Scotland doesn't report numbers less than 5. So missing value in Scotland dataset starting from 2020-04-08 indicates a number less than 5.
England
- In the first few days of reporting (before 2020-03-11), data of England is not always a number. To solve this problem, we added two columns,
cases_lower
andcases_upper
, to reflect the range of the number of cases. - England switched to ArcGIS later. We are downloading the CSV file directly.
Wales
- Wales stopped publishing detailed data on 2020-03-17.
- Wales switched to Tableau on 2020-04-08. https://public.tableau.com/profile/public.health.wales.health.protection#!/vizhome/RapidCOVID-19virology-Public/Headlinesummary
Northern Ireland
Northern Ireland does not publish detailed data.
IT
- The data source also provides the whole time-series data. Set the
-f
flag totrue
forscripts/download_it.py
to redownload all dates.
Community
Bugs and requests: PRs are welcome.
Telegram Channel (in Chinese): 新冠肺炎欧洲中文臺