Snakemake – A framework for reproducible data analysis

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • mandala

    A powerful and easy to use Python framework for experiment tracking and incremental computing

  • Snakemake is great, but it does feel like just a slightly more modern Make.

    I am pretty excited about research projects that tie the recipe and the computation closer together so that you do not preserve just the last recipe, but the whole history of exploratory computation and analysis.

    E.g. mandala (https://github.com/amakelov/mandala), a project of a colleague of mine which is basically semantic git for your computational graph and data at the same time.

  • tes-azure-legacy

    Discontinued [DEPRECATED] - A GA4GH Task Execution Service (TES) compatible implementation for Azure Compute

  • Snakemake is a beautiful project and evolves and improves so fast. Years ago I realized I needed to up my game from the usual bash based NGS data processing pipelines I was writing. Based on several recommendation I choose Snakemake. I have never regretted it, It worked perfectly on our PBS cluster then on our Slurm cluster. I made some steps to make it run on K8s, which is supports, and most recently, I'm still/again happy with my choice for Snakemake because it (together with Nextflow) seems to be the chosen framework for GA4GH's cloud work stream's "products" like WES and TES [0]. This seems to be the tech stack where Amazon Omics and Microsoft Genomics focus on [1].

    I owe a lot to Snakemake and Johannes Köster, I hope some day I can repay him and his project.

    [0] https://www.ga4gh.org/work_stream/cloud/

    [1] https://github.com/Microsoft/tes-azure

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • oxen-release

    Lightning fast data version control system for structured and unstructured machine learning datasets. We aim to make versioning datasets as easy as versioning code.

  • Super cool! Would love to see an integration with Oxen and their data version control https://github.com/Oxen-AI/oxen-release

  • make-booster

    Utility routines to simplify using GNU make and Python

  • For a very different approach, check out make-booster:

    https://github.com/david-a-wheeler/make-booster

    Make-booster provides utility routines intended to greatly simplify data processing (particularly a data pipeline) using GNU make. It includes some mechanisms specifically to help Python, as well as general-purpose mechanisms that can be useful in any system. In particular, it helps reliably reproduce results, and it automatically determines what needs to run and runs only that (producing a significant speedup in most cases). Released as open source software.

  • snakemake-wrappers

    This is the development home of the Snakemake wrapper repository, see

  • 1. Command-line tools are often used in steps of a bioinformatics pipeline, and they bridge the gap (e.g. https://github.com/snakemake/snakemake-wrappers).

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts