Home

Awesome

RDatasets.jl

Build status

The RDatasets package provides an easy way for Julia users to experiment with most of the standard data sets that are available in the core of R as well as datasets included with many of R's most popular packages. This package is essentially a simplistic port of the Rdatasets repo created by Vincent Arelbundock, who conveniently gathered data sets from many of the standard R packages in one convenient location on GitHub at https://github.com/vincentarelbundock/Rdatasets

In order to load one of the data sets included in the RDatasets package, you will need to have the DataFrames package installed. This package is automatically installed as a dependency of the RDatasets package if you install RDatasets as follows:

Pkg.add("RDatasets")

After installing the RDatasets package, you can then load data sets using the dataset() function, which takes the name of a package and a data set as arguments:

using RDatasets
iris = dataset("datasets", "iris")
neuro = dataset("boot", "neuro")

Data Sets

The RDatasets.packages() function returns a table of represented R packages:

PackageTitle
COUNTFunctions, data and code for count data.
EcdatData sets for econometrics
HSAURA Handbook of Statistical Analyses Using R (1st Edition)
HistDataData sets from the history of statistics and data visualization
ISLRData for An Introduction to Statistical Learning with Applications in R
KMsurvData sets from Klein and Moeschberger (1997), Survival Analysis
MASSSupport Functions and Datasets for Venables and Ripley's MASS
SASmixedData sets from "SAS System for Mixed Models"
ZeligEveryone's Statistical Software
adehabitatLTAnalysis of Animal Movements
bootBootstrap Functions (Originally by Angelo Canty for S)
carCompanion to Applied Regression
clusterCluster Analysis Extended Rousseeuw et al.
datasetsThe R Datasets Package
gamairDatasets used in the book Generalized Additive Models: An Introduction with R
gapGenetic analysis package
ggplot2An Implementation of the Grammar of Graphics
latticeLattice Graphics
lme4Linear mixed-effects models using Eigen and S4
mgcvMixed GAM Computation Vehicle with GCV/AIC/REML smoothness estimation
mlmRevExamples from Multilevel Modelling Software Review
nlregHigher Order Inference for Nonlinear Heteroscedastic Models
plmLinear Models for Panel Data
plyrTools for splitting, applying and combining data
psclPolitical Science Computational Laboratory, Stanford University
psychProcedures for Psychological, Psychometric, and Personality Research
quantregQuantile Regression
reshape2Flexibly Reshape Data: A Reboot of the Reshape Package.
robustbaseBasic Robust Statistics
rpartRecursive Partitioning and Regression Trees
sandwichRobust Covariance Matrix Estimators
semStructural Equation Models
survivalSurvival Analysis
vcdVisualizing Categorical Data

The RDatasets.datasets() function returns a table describing the 700+ included datasets. Or pass in a package name (e.g. RDatasets.datasets("mlmRev")) for a targeted table:

PackageDatasetTitleRowsColumns
mlmRevChem97Scores on A-level Chemistry in 1997310228
mlmRevContraceptionContraceptive use in Bangladesh19346
mlmRevEarlyEarly childhood intervention study3094
mlmRevExamExam scores from inner London405910
mlmRevGcsemvGCSE exam score19055
mlmRevHsb82High School and Beyond - 198271858
mlmRevMmmecMalignant melanoma deaths in Europe3546
mlmRevOxboysHeights of Boys in Oxford2344
mlmRevScotsSecScottish secondary school scores34356
mlmRevbdfLanguage Scores of 8-Graders in The Netherlands228728
mlmRevegsingleUS Sustaining Effects study723012
mlmRevguImmunImmunization in Guatemala215913
mlmRevguPrenatPrenatal care in Guatemala244915
mlmRevstarStudent Teacher Achievement Ratio (STAR) project data2679618

Licensing and Intellectual Property

Following Vincent's lead, we have assumed that all of the data sets in this repository can be made available under the GPL-3 license. If you know that one of the datasets released here should not be released publicly or if you know that a data set can only be released under a different license, please contact me so that I can remove the data set from this repository.