Snakemake – A framework for reproducible data analysis

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

mandala

8 228 6.3 Python

A powerful and easy to use Python framework for experiment tracking and incremental computing

Snakemake is great, but it does feel like just a slightly more modern Make.
I am pretty excited about research projects that tie the recipe and the computation closer together so that you do not preserve just the last recipe, but the whole history of exploratory computation and analysis.
E.g. mandala (https://github.com/amakelov/mandala), a project of a colleague of mine which is basically semantic git for your computational graph and data at the same time.

tes-azure-legacy

1 18 10.0 Python

Discontinued [DEPRECATED] - A GA4GH Task Execution Service (TES) compatible implementation for Azure Compute

Snakemake is a beautiful project and evolves and improves so fast. Years ago I realized I needed to up my game from the usual bash based NGS data processing pipelines I was writing. Based on several recommendation I choose Snakemake. I have never regretted it, It worked perfectly on our PBS cluster then on our Slurm cluster. I made some steps to make it run on K8s, which is supports, and most recently, I'm still/again happy with my choice for Snakemake because it (together with Nextflow) seems to be the chosen framework for GA4GH's cloud work stream's "products" like WES and TES [0]. This seems to be the tech stack where Amazon Omics and Microsoft Genomics focus on [1].
I owe a lot to Snakemake and Johannes Köster, I hope some day I can repay him and his project.
[0] https://www.ga4gh.org/work_stream/cloud/
[1] https://github.com/Microsoft/tes-azure

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
oxen-release

22 831 9.0 Python

Lightning fast data version control system for structured and unstructured machine learning datasets. We aim to make versioning datasets as easy as versioning code.

Super cool! Would love to see an integration with Oxen and their data version control https://github.com/Oxen-AI/oxen-release

make-booster

3 8 10.0 Makefile

Utility routines to simplify using GNU make and Python

For a very different approach, check out make-booster:
https://github.com/david-a-wheeler/make-booster
Make-booster provides utility routines intended to greatly simplify data processing (particularly a data pipeline) using GNU make. It includes some mechanisms specifically to help Python, as well as general-purpose mechanisms that can be useful in any system. In particular, it helps reliably reproduce results, and it automatically determines what needs to run and runs only that (producing a significant speedup in most cases). Released as open source software.

snakemake-wrappers

1 204 9.9 Python

This is the development home of the Snakemake wrapper repository, see

1. Command-line tools are often used in steps of a bioinformatics pipeline, and they bridge the gap (e.g. https://github.com/snakemake/snakemake-wrappers).

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Mandala: A little plaground for testing pixel logic patterns
2 projects | news.ycombinator.com | 7 Mar 2024
Mandala: Notebook memoization on steroids, used by Anthropic
1 project | news.ycombinator.com | 21 Dec 2023
ML Experiments Management with Git
4 projects | news.ycombinator.com | 2 Nov 2023
Airflow's Problem
6 projects | news.ycombinator.com | 2 Aug 2022
DevOps Fundamentals for Deep Learning Engineers
6 projects | /r/deeplearning | 20 Feb 2022

Snakemake – A framework for reproducible data analysis

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Data Science Genomics experiment-tracking Workflow engine Machine Learning
Post date: 15 Jul 2023

mandala

tes-azure-legacy

InfluxDB

oxen-release

make-booster

snakemake-wrappers

WorkOS

Related posts

Snakemake – A framework for reproducible data analysis

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com Data Science Genomics experiment-tracking Workflow engine Machine Learning Post date: 15 Jul 2023

mandala

tes-azure-legacy

InfluxDB

oxen-release

make-booster

snakemake-wrappers

WorkOS

Related posts

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Data Science Genomics experiment-tracking Workflow engine Machine Learning
Post date: 15 Jul 2023