Home

Awesome

<!-- README.md is generated from README.Rmd. Please edit that file -->

atable

travis codecov CRAN_Status_Badge

The atable package supports the analysis and reporting of controlled clinical trials. Reporting of clinical trials is such a frequent task that guidelines have been written which recommend certain properties of clinical trial reports (Moher et al. (2010)). In particular Item 17a of CONSORT states that “Trial results are often more clearly displayed in a table rather than in the text”. And Item 15 suggests: “a table showing baseline demographic and clinical characteristics for each group”. The atable package is specifically designed to comply with these two items.

Using atable

Load the package

library(atable)
# remotes::install_github("arminstroebel/atable") # development version

We will use the arthritis data set to demonstrate the features of atable, but as all variables are numeric in arthritis, we add some other variable types.

data(arthritis, package = "multgee")
arthritis <- within(arthritis, {
  score <- ordered(y)
  baselinescore <- ordered(baseline)
  time <- paste0("Month ", time)
  sex <- factor(sex, levels = c(1,2), labels = c("female", "male"))
  trt <- factor(trt, levels = c(1,2), labels = c("placebo", "drug"))
  date <- as.Date("2016-03-09") + runif(nrow(arthritis), -300, 300)
  })

To create a summary table of sex and age:

atable_options(format_to = "Console") # more on this in a moment
atable(arthritis, target_cols = c("sex", "age"))
##   Group                value    
## 1 Observations                  
## 2                      906      
## 3 sex                           
## 4      female          27% (249)
## 5      male            73% (657)
## 6      missing         0% (0)   
## 7 age                           
## 8      Mean (SD)       50 (11)  
## 9      valid (missing) 906 (0)

We can also get statistics grouped by some variable (e.g. a treatment indicator):

atable(arthritis, target_cols = c("sex", "age"), group_col = "trt")
## Warning in stats::ks.test(x, y, alternative = c("two.sided"), ...): p-value will be approximate in
## the presence of ties
##   Group                placebo   drug      p     stat  Effect Size (CI)    
## 1 Observations                                                             
## 2                      447       459                                       
## 3 sex                                                                      
## 4      female          29% (129) 26% (120) 0.4   0.71  1.1 (0.85; 1.6)     
## 5      male            71% (318) 74% (339)                                 
## 6      missing         0% (0)    0% (0)                                    
## 7 age                                                                      
## 8      Mean (SD)       51 (11)   50 (11)   0.043 0.092 0.058 (-0.072; 0.19)
## 9      valid (missing) 447 (0)   459 (0)

Furthermore, we can split by another variable (e.g. timepoints):

atable(arthritis, target_cols = c("sex", "age"), group_col = "trt", split_cols = "time")
## Warning in stats::ks.test(x, y, alternative = c("two.sided"), ...): p-value will be approximate in
## the presence of ties

## Warning in stats::ks.test(x, y, alternative = c("two.sided"), ...): p-value will be approximate in
## the presence of ties

## Warning in stats::ks.test(x, y, alternative = c("two.sided"), ...): p-value will be approximate in
## the presence of ties
##    Group                     placebo   drug      p    stat  Effect Size (CI)   
## 1  Month 1                                                                     
## 2       Observations                                                           
## 3                            149       153                                     
## 4       sex                                                                    
## 5            female          29% (43)  26% (40)  0.69 0.16  1.1 (0.67; 2)      
## 6            male            71% (106) 74% (113)                               
## 7            missing         0% (0)    0% (0)                                  
## 8       age                                                                    
## 9            Mean (SD)       51 (11)   50 (11)   0.55 0.092 0.058 (-0.17; 0.28)
## 10           valid (missing) 149 (0)   153 (0)                                 
## 11 Month 3                                                                     
## 12      Observations                                                           
## 13                           149       153                                     
## 14      sex                                                                    
## 15           female          29% (43)  26% (40)  0.69 0.16  1.1 (0.67; 2)      
## 16           male            71% (106) 74% (113)                               
## 17           missing         0% (0)    0% (0)                                  
## 18      age                                                                    
## 19           Mean (SD)       51 (11)   50 (11)   0.55 0.092 0.058 (-0.17; 0.28)
## 20           valid (missing) 149 (0)   153 (0)                                 
## 21 Month 5                                                                     
## 22      Observations                                                           
## 23                           149       153                                     
## 24      sex                                                                    
## 25           female          29% (43)  26% (40)  0.69 0.16  1.1 (0.67; 2)      
## 26           male            71% (106) 74% (113)                               
## 27           missing         0% (0)    0% (0)                                  
## 28      age                                                                    
## 29           Mean (SD)       51 (11)   50 (11)   0.55 0.092 0.058 (-0.17; 0.28)
## 30           valid (missing) 149 (0)   153 (0)

The same can be achieved via the formula interface:

atable(sex + age ~ trt | time, arthritis)

Here, the left hand side represents the variables being summarized. The right hand side gives the variables being used to group and split the variables. The variables used to split come after a | character

Output format

In an earlier code chunk, we set atable_options(format_to = "Console"). atable is designed to return output in a format optimized for LaTeX, the console, MS Word (via flextable and officer), HTML or raw (i.e. no formatting). This is for easy use in manuscripts.

form <- sex + age ~ trt
atable(form, arthritis, format_to = "Latex")   # format to LaTeX
atable(form, arthritis, format_to = "Console") # format to console
atable(form, arthritis, format_to = "HTML")    # format to HTML
atable(form, arthritis, format_to = "Raw")     # no formatting
atable(form, arthritis, format_to = "Word")    # format to MS Word

Modifying atable

If statistics other than the default ones are required, if it possible to return others by changing the functions atable uses to create summary statistics, tests, effect measures and formatting.

Here is an example to calculate median, MAD, mean and SD (requires argument x and returns a named list with class statistics_'class':

new_statistics_numeric <- function(x, ...){
  statistics_out <- list(Median = median(x, na.rm = TRUE), 
                         MAD = mad(x, na.rm = TRUE),
                         Mean = mean(x, na.rm = TRUE),
                         SD = sd(x, na.rm = TRUE))
  class(statistics_out) <- c("statistics_numeric", class(statistics_out))
  # We will need this new class later to specify the format
  return(statistics_out)
}

The suitable formatting function:

new_format_statistics_numeric <- function(x, ...){
  Median_MAD <- paste(round(c(x$Median, x$MAD), digits = 1), collapse = "; ")
  Mean_SD <- paste(round(c(x$Mean, x$SD), digits = 1), collapse = "; ")
  out <- data.frame(tag = factor(c("Median; MAD", "Mean; SD"), 
                                 levels = c("Median; MAD", "Mean; SD")),
                    # use levels to retain the order of the rows 
                    value = c(Median_MAD, Mean_SD),
                    stringsAsFactors = FALSE)
  return(out)
}

And a test function to compute both t-tests and Kolmogorov-Smirnov tests:

new_two_sample_htest_numeric <- function(value, group, ...){
  d <- data.frame(value = value, group = group)
  group_levels <- levels(group)
  x <- subset(d, group %in% group_levels[1], select = "value", drop = TRUE)
  y <- subset(d, group %in% group_levels[2], select = "value", drop = TRUE)
  ks_test_out <- stats::ks.test(x, y)
  t_test_out <- stats::t.test(x, y)
  out <- list(p_ks = ks_test_out$p.value,
              p_t = t_test_out$p.value )
  return(out)
}

These can then be passed to atable for use in the table:

atable(sex + age ~ trt | time, arthritis,
       statistics.numeric = new_statistics_numeric,
       format_statistics.statistics_numeric = new_format_statistics_numeric,
       two_sample_htest.numeric = new_two_sample_htest_numeric)
## Warning in stats::ks.test(x, y): p-value will be approximate in the presence of ties

## Warning in stats::ks.test(x, y): p-value will be approximate in the presence of ties

## Warning in stats::ks.test(x, y): p-value will be approximate in the presence of ties
##    Group                  placebo    drug      p    stat Effect Size (CI) p_ks p_t 
## 1  Month 1                                                                         
## 2       Observations                                                               
## 3                         149        153                                           
## 4       sex                                                                        
## 5            female       29% (43)   26% (40)  0.69 0.16 1.1 (0.67; 2)             
## 6            male         71% (106)  74% (113)                                     
## 7            missing      0% (0)     0% (0)                                        
## 8       age                                                                        
## 9            Median; MAD  55; 10.4   53; 10.4                             0.55 0.61
## 10           Mean; SD     50.7; 11.2 50.1; 11                                      
## 11 Month 3                                                                         
## 12      Observations                                                               
## 13                        149        153                                           
## 14      sex                                                                        
## 15           female       29% (43)   26% (40)  0.69 0.16 1.1 (0.67; 2)             
## 16           male         71% (106)  74% (113)                                     
## 17           missing      0% (0)     0% (0)                                        
## 18      age                                                                        
## 19           Median; MAD  55; 10.4   53; 10.4                             0.55 0.61
## 20           Mean; SD     50.7; 11.2 50.1; 11                                      
## 21 Month 5                                                                         
## 22      Observations                                                               
## 23                        149        153                                           
## 24      sex                                                                        
## 25           female       29% (43)   26% (40)  0.69 0.16 1.1 (0.67; 2)             
## 26           male         71% (106)  74% (113)                                     
## 27           missing      0% (0)     0% (0)                                        
## 28      age                                                                        
## 29           Median; MAD  55; 10.4   53; 10.4                             0.55 0.61
## 30           Mean; SD     50.7; 11.2 50.1; 11

(See Ströbel (2019) for passing methods via atable_options and changing atable’s namespace)

Extending atable

atable only has methods for numeric, factor and ordered variables but it possible to extend atables functionality by defining methods for other classes (e.g. Date or surv).

Here is an example for surv objects, while vignette("atable_usage", package = "atable") contains an example with Dates.

Define the statistics and testing functions:

statistics.Surv <- function(x, ...){
  survfit_object <- survival::survfit(x ~ 1)
# copy from survival:::print.survfit:
  out <- survival:::survmean(survfit_object, rmean = "common")
  return(list(mean_survival_time = out$matrix["*rmean"],
              SE = out$matrix["*se(rmean)"]))
}

two_sample_htest.Surv <- function(value, group, ...){
  survdiff_result <- survival::survdiff(value~group, rho=0)
  # copy from survival:::print.survdiff:
  etmp <- survdiff_result$exp
  df <- (sum(1 * (etmp > 0))) - 1
  p <- 1 - stats::pchisq(survdiff_result$chisq, df)
  return(list(p = p,stat = survdiff_result$chisq))
}

The ovarian dataset in the survival package has a suitable example…

library(survival)
## Warning: package 'survival' was built under R version 3.4.4
# set classes
ovarian <- within(survival::ovarian, 
                  {time_to_event = survival::Surv(futime, fustat)})
# create the table
atable(ovarian, target_cols = c("time_to_event"), group_col = "rx")
##   Group                   1   2   p   stat
## 1 Observations                            
## 2                         13  13          
## 3 time to event                           
## 4      mean_survival_time 650 889 0.3 1.1 
## 5      SE                 120 115

It is also possible to have different statistics for different variables of the same class, albeit with a little work. The approach is similar to that used for surv objects above, but also involves defining a subset function (most existing classes already have one, your new one probably doesn’t).

First we create a new variable by copying the age variable and add some noise:

arthritis$noisy_age <- arthritis$age + rnorm(nrow(arthritis), 2)
class(arthritis$noisy_age) <- c("numeric2", class(arthritis$noisy_age))

Now we need to define the appropriate functions…

# statistics function
statistics.numeric2 <- function(x, ...){
  statistics_out <- list(Median = median(as.numeric(x), na.rm = TRUE), 
                         MAD = mad(as.numeric(x), na.rm = TRUE),
                         Mean = mean(as.numeric(x), na.rm = TRUE),
                         SD = sd(as.numeric(x), na.rm = TRUE))
  class(statistics_out) <- c("statistics_numeric2", class(statistics_out))
  # We will need this new class later to specify the format
  return(statistics_out)
}
# testing function
two_sample_htest.numeric2 <- function(value, group, ...){
  d <- data.frame(value = as.numeric(value), group = group)
  group_levels <- levels(group)
  x <- subset(d, group %in% group_levels[1], select = "value", drop = TRUE)
  y <- subset(d, group %in% group_levels[2], select = "value", drop = TRUE)
  ks_test_out <- stats::ks.test(x, y)
  t_test_out <- stats::t.test(x, y)
  out <- list(p_ks = ks_test_out$p.value,
              p_t = t_test_out$p.value )
  return(out)
}
# subsetting function
'[.numeric2' <- function(x, i, j, ...){
  y <- unclass(x)[i, ...]
  class(y) <- c("numeric2", class(y))
  y
}

atable(age + noisy_age ~ trt, arthritis)
## Warning in stats::ks.test(x, y, alternative = c("two.sided"), ...): p-value will be approximate in
## the presence of ties
##    Group                placebo drug    p     stat  Effect Size (CI)     p_ks  p_t 
## 1  Observations                                                                    
## 2                       447     459                                                
## 3  age                                                                             
## 4       Mean (SD)       51 (11) 50 (11) 0.043 0.092 0.058 (-0.072; 0.19)           
## 5       valid (missing) 447 (0) 459 (0)                                            
## 6  noisy age                                                                       
## 7       Median          57      55                                       0.011 0.36
## 8       MAD             10      12                                                 
## 9       Mean            53      52                                                 
## 10      SD              11      11

References

<div id="refs" class="references"> <div id="ref-Moher2010">

Moher, D., S. Hopewell, K. F Schulz, V. Montori, P. C Gotzsche, P J Devereaux, D. Elbourne, M. Egger, and D. G Altman. 2010. “CONSORT 2010 Explanation and Elaboration: Updated Guidelines for Reporting Parallel Group Randomised Trials.” BMJ 340 (mar23 1): c869–c869. https://doi.org/10.1136/bmj.c869.

</div> <div id="ref-stroebel2019">

Ströbel, Armin. 2019. “atable: Create Tables for Clinical Trial Reports.” The R Journal 11 (1): 137–48. https://journal.r-project.org/archive/2019/RJ-2019-001/RJ-2019-001.pdf.

</div> </div>