Home

Awesome

app source wip

Access

About

This prototype app demonstrates how to create powerful data analysis tools with Shiny and targets. If deployed to appropriate infrastructure, it ensures that user storage and background processes persist after logout. The app recovers running jobs and saved data when the user logs back in. Because of targets, subsequent runs skip computationally expensive steps that are already up to date.

The case study

Bayesian joint models of survival and longitudinal non-survival outcomes reduce bias and describe relationships among endpoints (Gould et al. 2015). Statisticians routinely refine and explore such complicated models (Gelman et al. 2020), but the computation is so slow that routine changes are tedious to refresh. This app shows how targets can speed up iteration and Shiny can ease the burden of code development for established use cases.

Usage

When you first open the app, create a new project to establish a data analysis pipeline. You can create, switch, and delete projects at any time. Next, select the biomarkers and number of Markov chain Monte Carlo iterations. The pipeline will run one univariate joint model on each biomarker for the number of iterations you select. Each model analyzes rstanarm datasets pbcLong and pbcSurv to jointly model survival (time to event) and the biomarker (longitudinally).

Click the "Run pipeline" button to run the correct models in the correct order. The app button replaces the "Run pipeline" button with the the "Cancel pipeline" button when the pipeline of the current project is running in the background. The pipeline will run to completion even if you switch projects, log out, or get disconnected for idleness.

While the pipeline is running, the Progress and Logs tabs continuously refresh to monitor progress. The Progress tab uses the tar_watch() Shiny module, available through the functions tar_watch_ui() and tar_watch_server().

The Results tab refreshes the final plot every time the pipeline stops. The plot shows the marginal posterior distribution of the association parameter between mortality and the longitudinal biomarker.

Administration

  1. Optional: to customize the location of persistent storage, create an .Renviron file at the app root and set the TARGETS_SHINY_HOME environment variable. If you do, the app will store projects within file.path(Sys.getenv("TARGETS_SHINY_HOME"), Sys.getenv("USER"), ".targets-shiny"). Otherwise, storage will default to tools::R_user_dir("targets-shiny", which = "cache")
  2. To support persistent pipelines, deploy the app to Shiny Server, RStudio Connect, or other service that supports persistent server-side storage. Alternatively, if you just want to demo the app on a limited service such as shinyapps.io, set the TARGETS_SHINY_TRANSIENT environment variable to "true" in the .Renviron file in the app root directory. That way, the UI alerts the users that their projects are transient, the app writes to temporary storage (overriding TARGETS_TRANSIENT_HOME), and background processes terminate when the app exits.
  3. Require a login so the app knows the user name.
  4. Run the app as the logged-in user, not the system administrator or default user.
  5. If applicable, raise automatic timeout thresholds in RStudio Connect so the background processes running pipelines remain alive long enough to finish.

Development

Shiny apps with targets require specialized techniques such as user storage and persistent background processes.

User storage

targets writes to storage to ensure the pipeline stays up to date after R exits. This storage must be persistent and user-specific. This particular app defaults to tools::R_user_dir("app_name", which = "cache") but uses file.path(Sys.getenv("TARGETS_SHINY_HOME"), Sys.getenv("USER")) if TARGETS_SHINY_HOME is defined in the .Renviron file at the app root directory. In addition, it is best to deploy to a service like Shiny Server or RStudio Connect and provision enough space for the expected number of users.

Multiple projects

Projects manage multiple versions of the pipeline. In this app, each project is a directory inside user storage with app input settings, pipeline configuration, and results. A top-level _project file identifies the current active project. Functions in R/project.R configure, load, create, and destroy projects. The update*() functions in Shiny and shinyWidgets, such as updateSliderInput(), are particularly handy for restoring the input settings of a saved project. That is why this app does not need a single renderUI() or uiOutput().

Pipeline setup

Every targets pipeline requires a _targets.R configuration file and R scripts with supporting functions if applicable. The tar_helper() function writes arbitrary R scripts to the location of your choice, and tidy evaluation with !! is a convenient templating mechanism that translates Shiny UI inputs into target definitions. In this app, the functions in R/pipeline.R demonstrate the technique.

Persistent background processes

This particular app runs pipelines as background processes that persist after the user logs out. Before you launch a new pipeline, first check if there is already an existing one running. tar_pid() retrieves the ID of the most recent process to run the pipeline, and ps::pid() lists the IDs of all processes currently running. If no process is already running, start the targets pipeline in a persistent background process:

processx_handle <- tar_make(
  callr_function = r_bg,
  callr_arguments = list(
    cleanup = FALSE,
    supervise = FALSE,
    stdout = "/PATH/TO/USER/PROJECT/stdout.txt",
    stderr = "/PATH/TO/USER/PROJECT/stderr.txt"
  )
)

cleanup = FALSE keeps the process alive after the processx handle is garbage collected, and supervise = FALSE keeps process alive after the app itself exits. As long as the server keeps running, the pipeline will keep running. To help manage resources, the UI should have an action button to cancel the current process, and the server should automatically cancel it when the user deletes the project.

Monitor the background process

The app should continuously check whether the process is running at any given moment:

  1. Check if a process ID is available using targets::tar_exist_process().
  2. If possible, get the process ID of the most recent pipeline using targets::tar_pid().
  3. Check if the process ID is in ps::ps_pids() to see if the pipeline is running.

This particular app implements a process_status() function to do this.

process_status()
#> $pid
#> [1] 19442
#> 
#> $running
#> [1] FALSE

Inside the Shiny server function, we continuously refresh the status in a reactive value. If polling is expensive (as on an SGE cluster, see below) then please be generous with millis in invalidateLater().

process <- reactiveValues(status = process_status())
observe({
  invalidateLater(millis = 5000)
  process$status <- process_status()
})

This reactive value helps us:

  1. Only show certain UI elements if the pipeline is running. Use process$status$running to show activity or disable inputs when the pipeline is busy. Useful tools include show_spinner() from shinybusy and show(), hide(), enable(), and disable() from shinyjs.
  2. Refresh output and logs when the pipeline starts or stops. Simply write process$status inside a reactive context such as observe() or renderPlot().

Scaling out to many users

Serious scalable apps in production should long background processes as jobs on a cluster like SLURM or a cloud computing platform like Amazon Web Services. The existing high-performance computing capabilities in targets alleviate some of this, but the main process of each pipeline still runs locally. If this becomes too burdensome for the server, consider distributing these main processes as well.

In this app, the file R/process_sge.R has analogous functions to R/process.R for a Sun Grid Engine (SGE) cluster. The principles are similar to the ones described above for local processes. To configure the app for SGE, set the TARGETS_SHINY_BACKEND environment variable equal to "sge" in an app-level .Renviron file. You may also need to define an app-level .Rprofile file to load environment modules into R, e.g. if your cluster serves the qsub and qstat command line tools as environment modules.

Transient mode

For demonstration purposes, you may wish to deploy your app to a more limited service like shinyapps.io. For these situations, consider implementing a transient mode to alert users and clean up resources. If this particular app is deployed with the TARGETS_SHINY_TRANSIENT environment variable equal to "true", then:

  1. tar_make() runs with supervise = TRUE in callr_arguments so that all pipelines terminate when the R session exits.
  2. All user storage lives in a subdirectory of tempdir() so project files are automatically cleaned up.
  3. When the app starts, the UI shows a shinyalert to warn users about the above.

Progress

The tar_watch() Shiny module is available through the functions tar_watch_ui() and tar_watch_server(). This module continuously refreshes the tar_visnetwork() graph and the tar_progress_branches() table to communicate the current status of the pipeline. Visit this article for more information on Shiny modules.

Logs

The stdout and stderr log files provide cruder but more immediate information on the progress of the pipeline. To generate logs, set the stdout and stderr callr arguments as described previously. In the app server function, define text outputs that continuously refresh: every few milliseconds when the pipeline is running, once when the pipeline starts or stops, and once when the user switches projects. Below, you may wish to return just the last few lines instead of the full result of readLines(). And again, please be generous with millis in invalidateLater() to avoid overburdening the server.

output$stdout <- renderText({
  req(input$project)
  process$status
  if (process$status$running) invalidateLater(millis = 1000)
  readLines("/PATH/TO/USER/PROJECT/stdout.txt")
})
output$stderr <- renderText({
  req(input$project)
  process$status
  if (process$status$running) invalidateLater(millis = 1000)
  readLines("/PATH/TO/USER/PROJECT/stderr.txt")
})

In the UI, define text outputs that display proper line breaks and enable scrolling:

fluidRow(
  textOutput("stdout"),
  textOutput("stderr"),
  tags$head(tags$style("#stdout {white-space: pre-wrap; overflow-y:scroll; max-height: 600px;}")),
  tags$head(tags$style("#stderr {white-space: pre-wrap; overflow-y:scroll; max-height: 600px;}"))
)

Results

targets stores the output of the pipeline in a _targets/ folder at the project root. Use tar_read() to return a result. We use the process$status reactive value to refresh the data sparingly: once when the user switches projects and once when the pipeline starts or stops.

output$plot <- renderPlot({
  req(input$project)
  process$running
  tar_read(final_plot)
})

Thanks

For years, Eric Nantz has advanced the space of enterprise Shiny in the life sciences. The motivation for this app comes from his work, and it borrows many of his techniques.

Code of Conduct

Please note that the targets-shiny project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

References

  1. Brilleman S. "Estimating Joint Models for Longitudinal and Time-to-Event Data with rstanarm." rstanarm, Stan Development Team, 2020. https://cran.r-project.org/web/packages/rstanarm/vignettes/jm.html
  2. Gelman A, Vehtari A, Simpson D, Margossian CC, Carpenter B, Yao Y, Kennedy L, Gabry J, Burkner PC, Modrak M. "Bayesian Workflow." arXiv 2020, arXiv:2011.01808, https://arxiv.org/abs/2011.01808.
  3. Gould AL, Boye ME, Crowther MJ, Ibrahim JG, Quartey G, Micallef S, et al. "Joint modeling of survival and longitudinal non-survival data: current methods and issues. Report of the DIA Bayesian joint modeling working group." Stat Med. 2015; 34(14): 2181-95.