Top 23 reproducible-research Open-Source Projects

metaflow

24 7,586 9.2 Python

:rocket: Build and manage real-life ML, AI, and data science projects with ease!

Project mention: FLaNK Stack 05 Feb 2024 | dev.to | 2024-02-05

PyTorch-VAE

5 5,989 0.0 Python

A Collection of Variational Autoencoders (VAE) in PyTorch.
InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
Sacred

6 4,157 3.5 Python

Sacred is a tool to help you configure, organize, log and reproduce experiments developed at IDSIA.

Project mention: Sacred VS cascade - a user suggested alternative | libhunt.com/r/sacred | 2023-12-05

nextflow

9 2,538 9.7 Groovy

A DSL for data-driven computational pipelines

Project mention: Nextflow: Data-Driven Computational Pipelines | news.ycombinator.com | 2023-08-10

> It's been a while since you can rerun/resume Nextflow pipelines
Yes, you can resume, but you need your whole upstream DAG to be present. Snakemake can rerun a job when only the dependencies of that job are present, which allows to neatly manage the disk usage, or archive an intermediate state of a project and rerun things from there.
> and yes, you can have dry runs in Nextflow
You have stubs, which really isn't the same thing.
> I have no idea what you're referring to with the 'arbitrary limit of 1000 parallel jobs' though
I was referring to this issue: https://github.com/nextflow-io/nextflow/issues/1871. Except, the discussion doesn't give the issue a full justice. Nextflow spans each job in a separate thread, and when it tries to span 1000+ condor jobs it die with a cryptic error message. The option of -Dnxf.pool.type=sync and -Dnxf.pool.maxThreads=N prevents the ability to resume and attempts to rerun the pipeline.
> As for deleting temporary files, there are features that allow you to do a few things related to that, and other features being implemented.
There are some hacks for this - but nothing I would feel safe to integrate into a production tool. They are implementing something - you're right - and it's been the case for several years now, so we'll see.
Snakemake has all that out of the box.

fma

1 2,108 0.0 Jupyter Notebook

FMA: A Dataset For Music Analysis
benchmark_VAE

4 1,680 6.1 Python

Unifying Variational Autoencoder (VAE) implementations in Pytorch (NeurIPS 2022)
EvalAI

4 1,677 9.0 Python

:cloud: :rocket: :bar_chart: :chart_with_upwards_trend: Evaluating state of the art in AI
WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
ITK

7 1,339 9.8 C++

Insight Toolkit (ITK) -- Official Repository. ITK builds on a proven, spatially-oriented architecture for processing, segmentation, and registration of scientific images in two, three, or more dimensions.
drake

1 1,330 6.1 R

An R-focused pipeline toolkit for reproducibility and high-performance computing (by ropensci)
torch-fidelity

3 870 8.1 Python

High-fidelity performance metrics for generative models in PyTorch
targets

10 866 9.7 R

Function-oriented Make-like declarative workflows for R
Weave.jl

4 814 0.0 Julia

Scientific reports/literate programming for Julia

Project mention: GitHub - JunoLab/Weave.jl: Scientific reports/literate programming for Julia | /r/LitProg | 2023-05-31

disentangling-vae

1 753 0.0 Python

Experiments for understanding disentanglement in VAE latent representations
gpu-jupyter

2 661 7.8 Jupyter Notebook

GPU-Jupyter: Leverage the flexibility of Jupyterlab through the power of your NVIDIA GPU to run your code from Tensorflow and Pytorch in collaborative notebooks on the GPU.
papaja

2 626 7.4 HTML

papaja (Preparing APA Journal Articles) is an R package that provides document formats to produce complete APA manuscripts from RMarkdown-files (PDF and Word documents) and helper functions that facilitate reporting statistics, tables, and plots.
codebraid

4 361 5.2 Python

Live code in Pandoc Markdown
funflow

3 360 3.5 Haskell

Functional workflows
sarek

5 333 9.8 Nextflow

Analysis pipeline to detect germline or somatic variants (pre-processing, variant calling and annotation) from WGS / targeted sequencing

Project mention: Recommendations for software or online resources | /r/bioinformatics | 2023-04-29

huxtable

2 311 7.8 R

An R package to create styled tables in multiple output formats, with a friendly, modern interface.

Project mention: What type of table is this, and is there a way to do this in R? | /r/RStudio | 2023-12-06

As for styling, I highly recommend the huxtable package. You can style rows, columns, and individual cells however you want. It uses dplyr pipelining, if you’re familiar with that, so it’s super intuitive to use too.

trackdown

2 209 4.8 HTML

R package for collaborative writing and editing of R Markdown (or Sweave) documents in Google Docs.
example-get-started

2 167 0.0 Python

Get started DVC project
shournal

5 159 7.1 C++

Log shell-commands and used files. Snapshot executed scripts. Fully automatic.
htm.core

1 144 4.4 C++

Actively developed Hierarchical Temporal Memory (HTM) community fork (continuation) of NuPIC. Implementation for C++ and Python
SaaSHub

www.saashub.com sponsored

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

reproducible-research related posts

What type of table is this, and is there a way to do this in R?
1 project | /r/RStudio | 6 Dec 2023
GitHub - JunoLab/Weave.jl: Scientific reports/literate programming for Julia
1 project | /r/LitProg | 31 May 2023
Is there anything like funflow for rust?
2 projects | /r/rust | 20 Oct 2022
Literate DevOps
2 projects | news.ycombinator.com | 26 Sep 2022
Any place to do collaboratie writing in the new quarto format? if not, what's the best place/way to do it in rmarkdown you think?
1 project | /r/rstats | 2 Aug 2022
Researchers From INRIA France Propose ‘Pythae’: An Open-Source Python Library Unifying Common And State-of-the-Art Generative AutoEncoder (GAE) Implementations
1 project | /r/Python | 24 Jun 2022
[P] Pythae - Unifying generative autoencoder implementations in Python
1 project | /r/MachineLearning | 24 Jun 2022
A note from our sponsor - WorkOS
workos.com | 25 Apr 2024

The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning. Learn more →

Index

What are some of the best open-source reproducible-research projects? This list will help you:

	Project	Stars
1	metaflow	7,586
2	PyTorch-VAE	5,989
3	Sacred	4,157
4	nextflow	2,538
5	fma	2,108
6	benchmark_VAE	1,680
7	EvalAI	1,677
8	ITK	1,339
9	drake	1,330
10	torch-fidelity	870
11	targets	866
12	Weave.jl	814
13	disentangling-vae	753
14	gpu-jupyter	661
15	papaja	626
16	codebraid	361
17	funflow	360
18	sarek	333
19	huxtable	311
20	trackdown	209
21	example-get-started	167
22	shournal	159
23	htm.core	144