Title: | Pipeline Tools Inspired by 'GNU Make' |
---|---|
Description: | A suite of tools for transforming an existing workflow into a self-documenting pipeline with very minimal upfront costs. Segments of the pipeline are specified in much the same way a 'Make' rule is, by declaring an executable recipe (which might be an R script), along with the corresponding targets and dependencies. When the entire pipeline is run through, only those recipes that need to be executed will be. Meanwhile, execution metadata is captured behind the scenes for later inspection. |
Authors: | Kinto Behr [aut, cre, cph] |
Maintainer: | Kinto Behr <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.2.1 |
Built: | 2024-10-29 03:46:43 UTC |
Source: | https://github.com/kinto-b/makepipe |
make_with_source
It is sometimes useful to have access to certain objects which are generated as side-products in a source script which yields as a main-product one or more targets. Typically these objects are used for checking that the targets were produced as expected.
make_register(value, name, quiet = FALSE)
make_register(value, name, quiet = FALSE)
value |
A value to be registered in a source script and returned as part
of the |
name |
A variable name, given as a character string. No coercion is done, and the first element of a character vector of length greater than one will be used, with a warning. |
quiet |
A logical determining whether or not warnings are signaled when
|
value
invisibly
## Not run: # Imagine this is part of your source script: x <- readRDS("input.Rds") x <- do_stuff(x) chk <- do_check(x) make_register(chk, "x_check") saveRDS(x, "output.Rds") # You will have access to `chk` in your pipeline script: step_one <- make_with_source( "source.R", "output.Rds", "input.Rds", ) step_one$result$chk ## End(Not run)
## Not run: # Imagine this is part of your source script: x <- readRDS("input.Rds") x <- do_stuff(x) chk <- do_check(x) make_register(chk, "x_check") saveRDS(x, "output.Rds") # You will have access to `chk` in your pipeline script: step_one <- make_with_source( "source.R", "output.Rds", "input.Rds", ) step_one$result$chk ## End(Not run)
Instead of maintaining a separate pipeline script containing calls to
make_with_source()
, you can add roxygen-like headers to the .R files in
your pipeline containing the @makepipe
tag along with @targets
,
@dependencies
, and so on. These tags will be parsed by make_with_dir()
and used to construct a pipeline. You can call a specific part of the
pipeline that has been documented in this way using make_with_roxy()
.
make_with_dir( dir = ".", recursive = FALSE, build = TRUE, envir = new.env(parent = parent.frame()), quiet = getOption("makepipe.quiet") ) make_with_roxy( source, envir = new.env(parent = parent.frame()), quiet = getOption("makepipe.quiet") )
make_with_dir( dir = ".", recursive = FALSE, build = TRUE, envir = new.env(parent = parent.frame()), quiet = getOption("makepipe.quiet") ) make_with_roxy( source, envir = new.env(parent = parent.frame()), quiet = getOption("makepipe.quiet") )
dir |
A character vector of full path names; the default corresponds to the working directory |
recursive |
A logical determining whether or not to recurse into subdirectories |
build |
A logical determining whether or not the pipeline will be built immediately or simply returned to the user |
envir |
The environment in which to execute the |
quiet |
A logical determining whether or not messages are signaled |
source |
The path to an R script which makes the |
Other than @makepipe
, which is used to tell whether a given script should
be included in the pipeline, the tags recognised mirror the arguments to
make_with_source()
. In particular,
@targets
and @dependencies
are for declaring inputs and outputs, the
expected format is a comma separated list of strings like
@targets "out1.Rds", "out2.Rds"
but R code like @targets file.path(DIR, "out.Rds")
(evaluated in envir
) works too
@packages
is for declaring the packages that the targets depend on, the
expected format is @packages pkg1 pkg2 etc
@force
is for declaring whether or not execution should be forced, the
expected format is a logical like TRUE
or FALSE
See the getting started vignette for more information.
A Pipeline
object
Other make:
make_with_recipe()
,
make_with_source()
## Not run: # Create a pipeline from scripts in the working dir without executing it p <- make_with_dir(build = FALSE) p$build() # Then execute it yourself ## End(Not run)
## Not run: # Create a pipeline from scripts in the working dir without executing it p <- make_with_dir(build = FALSE) p$build() # Then execute it yourself ## End(Not run)
Make targets out of dependencies using a recipe
make_with_recipe( recipe, targets, dependencies = NULL, packages = NULL, envir = new.env(parent = parent.frame()), quiet = getOption("makepipe.quiet"), force = FALSE, label = NULL, note = NULL, ... )
make_with_recipe( recipe, targets, dependencies = NULL, packages = NULL, envir = new.env(parent = parent.frame()), quiet = getOption("makepipe.quiet"), force = FALSE, label = NULL, note = NULL, ... )
recipe |
A chunk of R code which makes the |
targets |
A character vector of paths to files |
dependencies |
A character vector of paths to files which the |
packages |
A character vector of names of packages which |
envir |
The environment in which to execute the |
quiet |
A logical determining whether or not messages are signaled |
force |
A logical determining whether or not execution of the |
label |
A short label for the |
note |
A description of what the |
... |
Additional parameters to pass to |
A Segment
object containing execution metadata.
Other make:
make_with_dir()
,
make_with_source()
## Not run: # Merge files in fresh environment if raw data has been updated since last # merged make_with_recipe( recipe = { dat <- readRDS("data/raw_data.Rds") pop <- readRDS("data/pop_data.Rds") merged_dat <- merge(dat, pop, by = "id") saveRDS(merged_dat, "data/merged_data.Rds") }, targets = "data/merged_data.Rds", dependencies = c("data/raw_data.Rds", "data/raw_pop.Rds") ) # Merge files in current environment if raw data has been updated since last # merged. (If recipe executed, all objects bound in source will be available # in current env). make_with_recipe( recipe = { dat <- readRDS("data/raw_data.Rds") pop <- readRDS("data/pop_data.Rds") merged_dat <- merge(dat, pop, by = "id") saveRDS(merged_dat, "data/merged_data.Rds") }, targets = "data/merged_data.Rds", dependencies = c("data/raw_data.Rds", "data/raw_pop.Rds"), envir = environment() ) # Merge files in global environment if raw data has been updated since last # merged. (If source executed, all objects bound in source will be available # in global env). make_with_recipe( recipe = { dat <- readRDS("data/raw_data.Rds") pop <- readRDS("data/pop_data.Rds") merged_dat <- merge(dat, pop, by = "id") saveRDS(merged_dat, "data/merged_data.Rds") }, targets = "data/merged_data.Rds", dependencies = c("data/raw_data.Rds", "data/raw_pop.Rds"), envir = globalenv() ) ## End(Not run)
## Not run: # Merge files in fresh environment if raw data has been updated since last # merged make_with_recipe( recipe = { dat <- readRDS("data/raw_data.Rds") pop <- readRDS("data/pop_data.Rds") merged_dat <- merge(dat, pop, by = "id") saveRDS(merged_dat, "data/merged_data.Rds") }, targets = "data/merged_data.Rds", dependencies = c("data/raw_data.Rds", "data/raw_pop.Rds") ) # Merge files in current environment if raw data has been updated since last # merged. (If recipe executed, all objects bound in source will be available # in current env). make_with_recipe( recipe = { dat <- readRDS("data/raw_data.Rds") pop <- readRDS("data/pop_data.Rds") merged_dat <- merge(dat, pop, by = "id") saveRDS(merged_dat, "data/merged_data.Rds") }, targets = "data/merged_data.Rds", dependencies = c("data/raw_data.Rds", "data/raw_pop.Rds"), envir = environment() ) # Merge files in global environment if raw data has been updated since last # merged. (If source executed, all objects bound in source will be available # in global env). make_with_recipe( recipe = { dat <- readRDS("data/raw_data.Rds") pop <- readRDS("data/pop_data.Rds") merged_dat <- merge(dat, pop, by = "id") saveRDS(merged_dat, "data/merged_data.Rds") }, targets = "data/merged_data.Rds", dependencies = c("data/raw_data.Rds", "data/raw_pop.Rds"), envir = globalenv() ) ## End(Not run)
Make targets out of dependencies using a source file
make_with_source( source, targets, dependencies = NULL, packages = NULL, envir = new.env(parent = parent.frame()), quiet = getOption("makepipe.quiet"), force = FALSE, label = NULL, note = NULL, ... )
make_with_source( source, targets, dependencies = NULL, packages = NULL, envir = new.env(parent = parent.frame()), quiet = getOption("makepipe.quiet"), force = FALSE, label = NULL, note = NULL, ... )
source |
The path to an R script which makes the |
targets |
A character vector of paths to files |
dependencies |
A character vector of paths to files which the |
packages |
A character vector of names of packages which |
envir |
The environment in which to execute the |
quiet |
A logical determining whether or not messages are signaled |
force |
A logical determining whether or not execution of the |
label |
A short label for the |
note |
A description of what the |
... |
Additional parameters to pass to |
A Segment
object containing execution metadata.
Other make:
make_with_dir()
,
make_with_recipe()
## Not run: # Merge files in fresh environment if raw data has been updated since last # merged make_with_source( source = "merge_data.R", targets = "data/merged_data.Rds", dependencies = c("data/raw_data.Rds", "data/raw_pop.Rds") ) # Merge files in current environment if raw data has been updated since last # merged. (If source executed, all objects bound in source will be available # in current env). make_with_source( source = "merge_data.R", targets = "data/merged_data.Rds", dependencies = c("data/raw_data.Rds", "data/raw_pop.Rds"), envir = environment() ) # Merge files in global environment if raw data has been updated since last # merged. (If source executed, all objects bound in source will be available # in global env). make_with_source( source = "merge_data.R", targets = "data/merged_data.Rds", dependencies = c("data/raw_data.Rds", "data/raw_pop.Rds"), envir = globalenv() ) ## End(Not run)
## Not run: # Merge files in fresh environment if raw data has been updated since last # merged make_with_source( source = "merge_data.R", targets = "data/merged_data.Rds", dependencies = c("data/raw_data.Rds", "data/raw_pop.Rds") ) # Merge files in current environment if raw data has been updated since last # merged. (If source executed, all objects bound in source will be available # in current env). make_with_source( source = "merge_data.R", targets = "data/merged_data.Rds", dependencies = c("data/raw_data.Rds", "data/raw_pop.Rds"), envir = environment() ) # Merge files in global environment if raw data has been updated since last # merged. (If source executed, all objects bound in source will be available # in global env). make_with_source( source = "merge_data.R", targets = "data/merged_data.Rds", dependencies = c("data/raw_data.Rds", "data/raw_pop.Rds"), envir = globalenv() ) ## End(Not run)
Check if targets are out-of-date vis-a-vis their dependencies
out_of_date(targets, dependencies, packages = NULL)
out_of_date(targets, dependencies, packages = NULL)
targets |
A character vector of paths to files |
dependencies |
A character vector of paths to files which the |
packages |
A character vector of names of packages which |
TRUE
if any of targets
are older than any of dependencies
or if
any of targets
do not exist; FALSE
otherwise
## Not run: out_of_date("data/processed_data.Rds", "data/raw_data.Rds") ## End(Not run)
## Not run: out_of_date("data/processed_data.Rds", "data/raw_data.Rds") ## End(Not run)
get_pipeline()
, set_pipeline()
and reset_pipeline()
access and modify
the current active pipeline, while all other helper functions do not affect
the active pipeline
is_pipeline(pipeline) set_pipeline(pipeline) get_pipeline() reset_pipeline()
is_pipeline(pipeline) set_pipeline(pipeline) get_pipeline() reset_pipeline()
pipeline |
A pipeline. See Pipeline for more details. |
Other pipeline:
Pipeline
,
pipeline-vis
## Not run: # Build up a pipeline from scratch and save it out reset_pipeline() # A series of `make_with_*()` blocks go here... saveRDS(get_pipeline(), "data/my_pipeline.Rds") # ... Later on we can read in and set the pipeline p <- readRDS("data/my_pipeline.Rds") set_pipeline(p) ## End(Not run)
## Not run: # Build up a pipeline from scratch and save it out reset_pipeline() # A series of `make_with_*()` blocks go here... saveRDS(get_pipeline(), "data/my_pipeline.Rds") # ... Later on we can read in and set the pipeline p <- readRDS("data/my_pipeline.Rds") set_pipeline(p) ## End(Not run)
Produce a flowchart visualisation of the pipeline. Out-of-date targets will be coloured red, up-to-date targets will be coloured green, and everything else will be blue.
show_pipeline( pipeline = get_pipeline(), as = c("nomnoml", "visnetwork", "text"), labels = NULL, notes = NULL, ... ) save_pipeline( file, pipeline = get_pipeline(), as = c("nomnoml", "visnetwork", "text"), labels = NULL, notes = NULL, ... )
show_pipeline( pipeline = get_pipeline(), as = c("nomnoml", "visnetwork", "text"), labels = NULL, notes = NULL, ... ) save_pipeline( file, pipeline = get_pipeline(), as = c("nomnoml", "visnetwork", "text"), labels = NULL, notes = NULL, ... )
pipeline |
A pipeline. See Pipeline for more details. |
as |
A string determining whether to use |
labels |
A named character vector mapping nodes in the |
notes |
A named character vector mapping nodes in the |
... |
Arguments passed onto |
file |
File to save png (nomnoml) or html (visnetwork) into |
Labels and notes must be supplied as named character vector where the
names correspond to the filepaths of nodes (i.e. targets
, dependencies
,
or source
scripts)
Other pipeline:
Pipeline
,
pipeline-accessors
## Not run: # Run pipeline make_with_source( "recode.R", "data/0 raw_data.R", "data/1 data.R" ) make_with_source( "merge.R", c("data/1 data.R", "data/0 raw_pop.R"), "data/2 data.R" ) # Visualise pipeline with custom notes show_pipeline(notes = c( "data/0 raw_data.R" = "Raw survey data", "data/0 raw_pop.R" = "Raw population data", "data/1 data.R" = "Survey data with recodes applied", "data/2 data.R" = "Survey data with demographic variables merged in" )) ## End(Not run)
## Not run: # Run pipeline make_with_source( "recode.R", "data/0 raw_data.R", "data/1 data.R" ) make_with_source( "merge.R", c("data/1 data.R", "data/0 raw_pop.R"), "data/2 data.R" ) # Visualise pipeline with custom notes show_pipeline(notes = c( "data/0 raw_data.R" = "Raw survey data", "data/0 raw_pop.R" = "Raw population data", "data/1 data.R" = "Survey data with recodes applied", "data/2 data.R" = "Survey data with demographic variables merged in" )) ## End(Not run)