workflowr
targets
workflowr | targets | |
---|---|---|
2 | 10 | |
807 | 871 | |
0.9% | 1.8% | |
5.4 | 9.6 | |
2 months ago | 7 days ago | |
R | R | |
GNU General Public License v3.0 or later | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
workflowr
-
How do you manage results, plots, etc.?
I would start by saying "bioinformatics world" is a very broad term. Most of it does not involve managing multiple models, which is what MLflow seems to be for. Most work is cleaning the data and interpreting the results, which is highly project-specific. Something like workflowr is generally more appropriate, but even that is an overkill for most people.
-
Restructuring a large R project. Need advice on how to wire up file paths and associated objects.
Targets stores all your data serialised on disk, and only loads them in as needed by the dependencies. For a standardised folder structure, you can take inspiration from workflowr.
targets
-
Advice on Best Practices
Is this it https://github.com/ropensci/targets?
-
Does anyone else feel in a tricky spot about their use of R?
I'll chime in with others to say that using targets can help with the memory load as well. If you partition your data adequately (e.g. grouping by subjects), you can take advantage of the way targets maps data so it only loads what it needs to. Moreover, if you use the memory = "transient" option, it will unload objects between steps -- adding a little bit of time overhead but saving you on memory. targets and tidytable together have enabled me to work on pretty sizeable datasets while rarely running into memory issues. In fact, the only time I ran into a data memory hog was because I didn't adequately partition the data across worker nodes.
-
What are your favorite R Libraries?
targets
-
Is there a better way to update an entire series of scripts?
I highly recommend the holy grail of workflow orchestrators / executors in the R ecosystem: targets.
- The new Drake ropensci targets: Function-oriented Make-like declarative workflows for R {R}
-
How do you manage, distribute and schedule jobs written in R?
That said, you might want to check out the ‘targets’ package, which provides a DSL for specifying complex workflow descriptions in R. When repeatedly running the same jobs on changing data, this package helps ensure that only necessary work is performed (suitable intermediate results are reused), and scripts are run reproducibly. This might help with sceduling.
-
How do I do something like this as a parallel programming in R?
It may be worth it to put these individual steps into a targets pipeline. targets is designed to support parallelization with future and make it easier to visualize downstream dependencies.
-
Tips re: workflow, organization, file hygiene and similar?
Given your requirements, I recommend you check out ‘targets’, which specifically addresses the needs of reusable workflows in R, and it seems like it fits your requirements to a T.
-
Your impression of {targets}? (r package)
The targets package is the official successor to Drake, and has the same primary author (Will Landau). He has explained why he created targets, which includes stronger guardrails for users and better UX.
-
Data engineering with R?
I use it for ETL. I use targets as the workflow management software, and, like others, have a cron job set up to run nightly builds.
What are some alternatives?
box - Write reusable, composable and modular R code
dbt-core - dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
targets-minimal - A minimal example data analysis project with the targets R package
drake - An R-focused pipeline toolkit for reproducibility and high-performance computing
reprex - Render bits of R code for sharing, e.g., on GitHub or StackOverflow.
awesome-pipeline - A curated list of awesome pipeline toolkits inspired by Awesome Sysadmin
blogdown - Create Blogs and Websites with R Markdown
tidyverse - Easily install and load packages from the tidyverse
data-science-development-project-template - A logical, reasonably standardized, but flexible project structure for doing and sharing data science research work while developing a software tool.
fastverse - An Extensible Suite of High-Performance and Low-Dependency Packages for Statistical Computing and Data Manipulation in R
webR-quarto-demos - Experiments with generating a standalone Quarto Document using Web R
targets-tutorial - Short course on the targets R package