Awesome
Covid19CanadaData: Download Canadian COVID-19 Data
The goal of Covid19CanadaData is to facilitate the acquisition of Canadian COVID-19 data from the following sources:
- Live versions of Canadian COVID-19 datasets available on the Internet
- The Canadian COVID-19 Data Archive, which provides daily snapshots of COVID-19 data from various Canadian government sources (and select non-governmental sources), via live URLs (for current versions) and Amazon S3 (for archived versions). All datasets are catalogued in datasets.json
Covid19CanadaData
is part of Covid19CanadaETL
, which is used to assemble the Covid19Canada
dataset from the COVID-19 Canada Open Data Working Group. It is also used in the Timeline of COVID-19 in Canada, one component of the What Happened? COVID-19 in Canada project.
Installation
You can install the development version of Covid19CanadaData from GitHub with:
# install.packages("devtools")
devtools::install_github("ccodwg/Covid19CanadaData")
Note that for webpages requiring JavaScript to render their contents, Docker must be installed the Docker daemon must be running and available. See install instructions for Docker Desktop on Windows and Mac. On Linux, rootless Docker should be installed by running the below command and following the instructions:
curl -sSL https://get.docker.com/rootless | sh
On Windows, a Python installation with the packages docker
and pypiwin32
and the R package reticulate
are further required; see here for more details.
Examples
Live Canadian COVID-19 datasets
Below are some example commands for downloading the live versions of data catalogued in the Canadian COVID-19 Data Archive. Datasets are referenced using the UUID from datasets.json in Covid19CanadaArchive.
# download live versions of datasets catalogued in the Canadian COVID-19 Data Archive
## get PHAC epidemiology update CSV
d1 <- Covid19CanadaData::dl_dataset("314c507d-7e48-476e-937b-965499f51e8e")
## get Ontario hospitalizations CSV
d2 <- Covid19CanadaData::dl_dataset("4b214c24-8542-4d26-a850-b58fc4ef6a30")
## get summary page of Alberta respiratory virus dashboard
d3 <- Covid19CanadaData::dl_dataset("2a11bbcc-7b43-47d1-952d-437cdc9b2ffb")
rvest::html_table(d3) # extract tables
## get BC COVID-19 situation report (requires Docker)
d4 <- Covid19CanadaData::dl_dataset("b85ca9d5-3a88-403d-9444-cac73ffb2d3f")
rvest::html_table(d4) # extract tables
Archived Canadian COVID-19 datasets
# load most recent archived PHAC epidemiology update CSV
# and current live version into R
# returns a list of data frames named according to date
Covid19CanadaData::dl_archive(
uuid = "314c507d-7e48-476e-937b-965499f51e8e",
date = "latest" # latest archived version
)
# download BC Regional Health Authority cumulative summary JSON files
# from December 2021 to a temporary directory
# saves files to local drive rather than loading into R
temp_dir <- tempdir() # define temporary directory
Covid19CanadaData::dl_archive(
uuid = "91367e1d-8b79-422c-b314-9b3441ba4f42",
after = "2021-12-01",
before = "2021-12-31",
path = temp_dir,
remove_duplicates = TRUE # don't download duplicates files (default = TRUE)
)
list.files(temp_dir) # list files
Citing this package
A citation for Covid19CanadaData
may be generated by running citation("Covid19CanadaData")
.