Awesome
Access
- https://wlandau.shinyapps.io/targets-shiny/: app running in transient mode
- https://github.com/wlandau/targets-shiny: source code
About
This prototype app demonstrates how to create powerful data analysis tools with Shiny and targets
. If deployed to appropriate infrastructure, it ensures that user storage and background processes persist after logout. The app recovers running jobs and saved data when the user logs back in. Because of targets
, subsequent runs skip computationally expensive steps that are already up to date.
The case study
Bayesian joint models of survival and longitudinal non-survival outcomes reduce bias and describe relationships among endpoints (Gould et al. 2015). Statisticians routinely refine and explore such complicated models (Gelman et al. 2020), but the computation is so slow that routine changes are tedious to refresh. This app shows how targets
can speed up iteration and Shiny can ease the burden of code development for established use cases.
Usage
When you first open the app, create a new project to establish a data analysis pipeline. You can create, switch, and delete projects at any time. Next, select the biomarkers and number of Markov chain Monte Carlo iterations. The pipeline will run one univariate joint model on each biomarker for the number of iterations you select. Each model analyzes rstanarm
datasets pbcLong
and pbcSurv
to jointly model survival (time to event) and the biomarker (longitudinally).
Click the "Run pipeline" button to run the correct models in the correct order. The app button replaces the "Run pipeline" button with the the "Cancel pipeline" button when the pipeline of the current project is running in the background. The pipeline will run to completion even if you switch projects, log out, or get disconnected for idleness.
While the pipeline is running, the Progress and Logs tabs continuously refresh to monitor progress. The Progress tab uses the tar_watch()
Shiny module, available through the functions tar_watch_ui()
and tar_watch_server()
.
The Results tab refreshes the final plot every time the pipeline stops. The plot shows the marginal posterior distribution of the association parameter between mortality and the longitudinal biomarker.
Administration
- Optional: to customize the location of persistent storage, create an
.Renviron
file at the app root and set theTARGETS_SHINY_HOME
environment variable. If you do, the app will store projects withinfile.path(Sys.getenv("TARGETS_SHINY_HOME"), Sys.getenv("USER"), ".targets-shiny")
. Otherwise, storage will default totools::R_user_dir("targets-shiny", which = "cache")
- To support persistent pipelines, deploy the app to Shiny Server, RStudio Connect, or other service that supports persistent server-side storage. Alternatively, if you just want to demo the app on a limited service such as shinyapps.io, set the
TARGETS_SHINY_TRANSIENT
environment variable to"true"
in the.Renviron
file in the app root directory. That way, the UI alerts the users that their projects are transient, the app writes to temporary storage (overridingTARGETS_TRANSIENT_HOME
), and background processes terminate when the app exits. - Require a login so the app knows the user name.
- Run the app as the logged-in user, not the system administrator or default user.
- If applicable, raise automatic timeout thresholds in RStudio Connect so the background processes running pipelines remain alive long enough to finish.
Development
Shiny apps with targets
require specialized techniques such as user storage and persistent background processes.
User storage
targets
writes to storage to ensure the pipeline stays up to date after R exits. This storage must be persistent and user-specific. This particular app defaults to tools::R_user_dir("app_name", which = "cache")
but uses file.path(Sys.getenv("TARGETS_SHINY_HOME"), Sys.getenv("USER"))
if TARGETS_SHINY_HOME
is defined in the .Renviron
file at the app root directory. In addition, it is best to deploy to a service like Shiny Server or RStudio Connect and provision enough space for the expected number of users.
Multiple projects
Projects manage multiple versions of the pipeline. In this app, each project is a directory inside user storage with app input settings, pipeline configuration, and results. A top-level _project
file identifies the current active project. Functions in R/project.R
configure, load, create, and destroy projects. The update*()
functions in Shiny and shinyWidgets
, such as updateSliderInput()
, are particularly handy for restoring the input settings of a saved project. That is why this app does not need a single renderUI()
or uiOutput()
.
Pipeline setup
Every targets
pipeline requires a _targets.R
configuration file and R scripts with supporting functions if applicable. The tar_helper()
function writes arbitrary R scripts to the location of your choice, and tidy evaluation with !!
is a convenient templating mechanism that translates Shiny UI inputs into target definitions. In this app, the functions in R/pipeline.R
demonstrate the technique.
Persistent background processes
This particular app runs pipelines as background processes that persist after the user logs out. Before you launch a new pipeline, first check if there is already an existing one running. tar_pid()
retrieves the ID of the most recent process to run the pipeline, and ps::pid()
lists the IDs of all processes currently running. If no process is already running, start the targets
pipeline in a persistent background process:
processx_handle <- tar_make(
callr_function = r_bg,
callr_arguments = list(
cleanup = FALSE,
supervise = FALSE,
stdout = "/PATH/TO/USER/PROJECT/stdout.txt",
stderr = "/PATH/TO/USER/PROJECT/stderr.txt"
)
)
cleanup = FALSE
keeps the process alive after the processx
handle is garbage collected, and supervise = FALSE
keeps process alive after the app itself exits. As long as the server keeps running, the pipeline will keep running. To help manage resources, the UI should have an action button to cancel the current process, and the server should automatically cancel it when the user deletes the project.
Monitor the background process
The app should continuously check whether the process is running at any given moment:
- Check if a process ID is available using
targets::tar_exist_process()
. - If possible, get the process ID of the most recent pipeline using
targets::tar_pid()
. - Check if the process ID is in
ps::ps_pids()
to see if the pipeline is running.
This particular app implements a process_status()
function to do this.
process_status()
#> $pid
#> [1] 19442
#>
#> $running
#> [1] FALSE
Inside the Shiny server function, we continuously refresh the status in a reactive value. If polling is expensive (as on an SGE cluster, see below) then please be generous with millis
in invalidateLater()
.
process <- reactiveValues(status = process_status())
observe({
invalidateLater(millis = 5000)
process$status <- process_status()
})
This reactive value helps us:
- Only show certain UI elements if the pipeline is running. Use
process$status$running
to show activity or disable inputs when the pipeline is busy. Useful tools includeshow_spinner()
fromshinybusy
andshow()
,hide()
,enable()
, anddisable()
fromshinyjs
. - Refresh output and logs when the pipeline starts or stops. Simply write
process$status
inside a reactive context such asobserve()
orrenderPlot()
.
Scaling out to many users
Serious scalable apps in production should long background processes as jobs on a cluster like SLURM or a cloud computing platform like Amazon Web Services. The existing high-performance computing capabilities in targets
alleviate some of this, but the main process of each pipeline still runs locally. If this becomes too burdensome for the server, consider distributing these main processes as well.
In this app, the file R/process_sge.R
has analogous functions to R/process.R
for a Sun Grid Engine (SGE) cluster. The principles are similar to the ones described above for local processes. To configure the app for SGE, set the TARGETS_SHINY_BACKEND
environment variable equal to "sge"
in an app-level .Renviron
file. You may also need to define an app-level .Rprofile
file to load environment modules into R, e.g. if your cluster serves the qsub
and qstat
command line tools as environment modules.
Transient mode
For demonstration purposes, you may wish to deploy your app to a more limited service like shinyapps.io. For these situations, consider implementing a transient mode to alert users and clean up resources. If this particular app is deployed with the TARGETS_SHINY_TRANSIENT
environment variable equal to "true"
, then:
tar_make()
runs withsupervise = TRUE
incallr_arguments
so that all pipelines terminate when the R session exits.- All user storage lives in a subdirectory of
tempdir()
so project files are automatically cleaned up. - When the app starts, the UI shows a
shinyalert
to warn users about the above.
Progress
The tar_watch()
Shiny module is available through the functions tar_watch_ui()
and tar_watch_server()
. This module continuously refreshes the tar_visnetwork()
graph and the tar_progress_branches()
table to communicate the current status of the pipeline. Visit this article for more information on Shiny modules.
Logs
The stdout
and stderr
log files provide cruder but more immediate information on the progress of the pipeline. To generate logs, set the stdout
and stderr
callr
arguments as described previously. In the app server function, define text outputs that continuously refresh: every few milliseconds when the pipeline is running, once when the pipeline starts or stops, and once when the user switches projects. Below, you may wish to return just the last few lines instead of the full result of readLines()
. And again, please be generous with millis
in invalidateLater()
to avoid overburdening the server.
output$stdout <- renderText({
req(input$project)
process$status
if (process$status$running) invalidateLater(millis = 1000)
readLines("/PATH/TO/USER/PROJECT/stdout.txt")
})
output$stderr <- renderText({
req(input$project)
process$status
if (process$status$running) invalidateLater(millis = 1000)
readLines("/PATH/TO/USER/PROJECT/stderr.txt")
})
In the UI, define text outputs that display proper line breaks and enable scrolling:
fluidRow(
textOutput("stdout"),
textOutput("stderr"),
tags$head(tags$style("#stdout {white-space: pre-wrap; overflow-y:scroll; max-height: 600px;}")),
tags$head(tags$style("#stderr {white-space: pre-wrap; overflow-y:scroll; max-height: 600px;}"))
)
Results
targets
stores the output of the pipeline in a _targets/
folder at the project root. Use tar_read()
to return a result. We use the process$status
reactive value to refresh the data sparingly: once when the user switches projects and once when the pipeline starts or stops.
output$plot <- renderPlot({
req(input$project)
process$running
tar_read(final_plot)
})
Thanks
For years, Eric Nantz has advanced the space of enterprise Shiny in the life sciences. The motivation for this app comes from his work, and it borrows many of his techniques.
Code of Conduct
Please note that the targets-shiny
project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.
References
- Brilleman S. "Estimating Joint Models for Longitudinal and Time-to-Event Data with rstanarm."
rstanarm
, Stan Development Team, 2020. https://cran.r-project.org/web/packages/rstanarm/vignettes/jm.html - Gelman A, Vehtari A, Simpson D, Margossian CC, Carpenter B, Yao Y, Kennedy L, Gabry J, Burkner PC, Modrak M. "Bayesian Workflow." arXiv 2020, arXiv:2011.01808, https://arxiv.org/abs/2011.01808.
- Gould AL, Boye ME, Crowther MJ, Ibrahim JG, Quartey G, Micallef S, et al. "Joint modeling of survival and longitudinal non-survival data: current methods and issues. Report of the DIA Bayesian joint modeling working group." Stat Med. 2015; 34(14): 2181-95.