#Pipeline

Open-source projects categorized as Pipeline | Edit details

Top 23 Pipeline Open-Source Projects

  • GitHub repo vector

    A high-performance, highly reliable, observability data pipeline.

    Project mention: Lightweight and ultra-fast tool for building observability pipelines | news.ycombinator.com | 2021-05-10
  • GitHub repo Brunch

    :fork_and_knife: Web applications made easy. Since 2011.

    Project mention: 🕵️Something new every now and then: Trying Brunch🍴 | dev.to | 2021-03-20

    So, the website looks promising:

  • GitHub repo pipeline

    A cloud-native Pipeline resource.

    Project mention: Write Gitlab CI Pipelines in Python Code | news.ycombinator.com | 2021-04-29

    Check out tekton CI, it's a Kubernetes operator to run a CI pipeline defined as commands that run inside any container. Yeah you need a k8s cluster, but even a simple kind dev cluster that you spin up in 30 seconds with one command on your laptop will work. https://tekton.dev/

    I like it a lot because it enforces very little structure on you and doesn't reinvent everything. Stuff like storage (either ephermeral or existing volumes), secrets, configuration, etc. are already modeled and supported by Kubernetes and tekton can use all of that natively.

    If you're really averse to k8s though, check out drone. It has a local execution mode that is similar and just runs whatever pipeline commands you want in docker containers. https://github.com/drone/drone

  • GitHub repo argo-cd

    Declarative continuous deployment for Kubernetes.

    Project mention: Configuring ArgoCD on Amazon EKS | dev.to | 2021-04-17

    stages: - init - deploy variables: KUBECTL_VERSION: 1.20.5 ARGOCD_VERSION: 1.7.4 ARGOCD_ADDR: argocd.example.com # Get ArgoCD credentials from Secret Manager before_script: - export AROGOCD_TOKEN="$(aws secretsmanager get-secret-value --secret-id argocd-token --version-stage AWSCURRENT --query SecretString --output text)" # install kubectl - curl -L "https://storage.googleapis.com/kubernetes-release/release/v${KUBECTL_VERSION}/bin/linux/amd64/kubectl" -o /usr/bin/kubectl # install argocd - curl -sSL -o /usr/local/bin/argocd "https://github.com/argoproj/argo-cd/releases/download/v${ARGOCD_VERSION}/argocd-linux-amd64" init demo project 🔬: stage: init when: manual image: name: amazon/aws-cli script: - argocd cluster add $BUSINESS_K8S_CONTEXT --name business-cluster-dev --kubeconfig $KUBE_CONFIG --auth-token=${AROGOCD_TOKEN} --server ${ARGOCD_ADDR} || echo 'cluster already added' tags: - k8s-dev-runner only: - master deploy demo project 🚀: stage: init when: manual image: name: amazon/aws-cli script: - sed -i "s,,$BUSINESS_K8S_CLUSTER_URL,g;s,,$CI_PROJECT_URL.git,g" application.yaml # Connect to aws eks devops cluster - aws eks update-kubeconfig --region $AWS_REGION --name $EKS_CLUSTER_NAME # Create ArgoCD project - argocd proj create demo-dev -d $KUBERNETES_CLUSTER_URL,app-dev -s $CI_PROJECT_URL.git --auth-token=${AROGOCD_TOKEN} --server ${ARGOCD_ADDR} || echo 'project already created' # Create ArgoCD application - kubectl apply -n argocd -f application.yaml tags: - k8s-dev-runner only: - master deploy demo app 🌐: stage: deploy image: name: amazon/aws-cli script: - cd envs/dev - argocd app sync demo-dev --auth-token=${AROGOCD_TOKEN} --server ${ARGOCD_ADDR} tags: - k8s-dev-runner only: - tags

  • GitHub repo gaia

    Build powerful pipelines in any programming language.

    Project mention: best way to mock/unit test http.Client? | reddit.com/r/golang | 2021-03-15

    Something like this: https://github.com/gaia-pipeline/gaia/blob/8a9f66742fbb4c5120e52fcfeef94ff4fcfd63a0/workers/pipeline/git.go#L233

  • GitHub repo great_expectations

    Always know what to expect from your data.

    Project mention: Looking for open-source model serving framework with dashboard for test data quality | reddit.com/r/datascience | 2021-03-31

    it should have a dashboard for test data quality monitoring - ideally with alarms from the great_expectations framework https://github.com/great-expectations/great_expectations

  • GitHub repo papermill

    📚 Parameterize, execute, and analyze notebooks

    Project mention: What is with the assumption that if you are using the jupyter you are a noob? | reddit.com/r/datascience | 2021-03-30
  • GitHub repo Kedro

    A Python framework for creating reproducible, maintainable and modular data science code.

    Project mention: What is the best structured ds project you have seen? | reddit.com/r/datascience | 2021-04-16

    Another one of my personal faves is Kedro. Great ETL framework made especially for data scientists.

  • GitHub repo airbyte

    Airbyte is an open-source EL(T) platform that helps you replicate your data in your warehouses, lakes and databases.

    Project mention: Anyone created a ETL pipeline from Facebook Graph API before ? | reddit.com/r/dataengineering | 2021-05-02

    Thanks for the hevo.io spot, noticed it actually has page insights ! I'm actually aiming to use airbyte.io in the end and help contribute a source connector, but I guess I'll stick with hevo.io at first

  • GitHub repo Pipcook

    Machine learning platform for Web developers

    Project mention: Faster Pipcook 1.2, machine learning in JavaScript | dev.to | 2020-09-08

    Pipcook 1.3 roadmap

  • GitHub repo PyFunctional

    Python library for creating data pipelines with chain functional programming

    Project mention: PyFunctional makes creating data pipelines easy by using chained functional operators | reddit.com/r/Python | 2021-03-31
  • GitHub repo mara-pipelines

    A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow

    Project mention: Build your own “data lake” for reporting purposes | news.ycombinator.com | 2021-03-14

    Minio and nifi, require machines by themselves. Better off pure python and if obe wants sonething lighweight and visually pleasing Mara [0] or Dagster with Dagit [1] will do the job

    [0] https://github.com/mara/mara-pipelines

    [1] https://docs.dagster.io/tutorial/execute

  • GitHub repo drake

    An R-focused pipeline toolkit for reproducibility and high-performance computing (by ropensci)

    Project mention: Your impression of {targets}? (r package) | reddit.com/r/Rlanguage | 2021-05-02

    The targets package is the official successor to Drake, and has the same primary author (Will Landau). He has explained why he created targets, which includes stronger guardrails for users and better UX.

  • GitHub repo MLJ.jl

    A Julia machine learning framework

    Project mention: sklearn equivalent for Julia? | reddit.com/r/Julia | 2021-04-14

    Imho, Julia is more diverse in the sense that there is not a single popular ML library. Maybe the Julian equivalent for scikit-learn is MLJ.jl. There is also ScikitLearn.jl, which defines the usual interface of scikit-learn models, and specific algorithms then implement this interface.

  • GitHub repo galaxy

    Data intensive science for everyone. (by galaxyproject)

    Project mention: Developed a new kind of dual extruder system on fully custom built 3D printer | reddit.com/r/3Dprinting | 2021-03-01
  • GitHub repo go-streams

    A lightweight stream processing library for Go

    Project mention: A flexible and powerful stream processing library for Go | news.ycombinator.com | 2020-12-22
  • GitHub repo pdpipe

    Easy pipelines for pandas DataFrames.

  • GitHub repo ttyplot

    a realtime plotting utility for terminal/console with data input from stdin

    Project mention: plotpipe: plot data from a pipe | reddit.com/r/commandline | 2021-03-04
  • GitHub repo OK

    Elegant error/exception handling in Elixir, with result monads.

  • GitHub repo dashboard

    A dashboard for Tekton! (by tektoncd)

    Project mention: Write Gitlab CI Pipelines in Python Code | news.ycombinator.com | 2021-04-29
  • GitHub repo Flowex

    Flow-Based Programming framework for Elixir

  • GitHub repo targets

    Function-oriented Make-like declarative workflows for R

    Project mention: Your impression of {targets}? (r package) | reddit.com/r/Rlanguage | 2021-05-02

    The targets package is the official successor to Drake, and has the same primary author (Will Landau). He has explained why he created targets, which includes stronger guardrails for users and better UX.

  • GitHub repo catalog

    Catalog of shared Tasks and Pipelines.

    Project mention: Cloud Native CI/CD with Tekton - Building Custom Tasks | dev.to | 2021-04-14

    Another common thing that you might need in your Tasks is some kind of a storage where you can write data that can be used by subsequent steps in the Task or by other Tasks in the pipeline. The most common use case for this would be a place to fetch git repo. This kind of a storage is called workspace in Tekton and the following example shows a Tasks that mounts and clears the storage using rmdir:

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2021-05-10.

Index

What are some of the best open-source Pipeline projects? This list will help you:

Project Stars
1 vector 6,979
2 Brunch 6,768
3 pipeline 6,229
4 argo-cd 5,907
5 gaia 4,361
6 great_expectations 4,312
7 papermill 4,085
8 Kedro 3,789
9 airbyte 2,354
10 Pipcook 1,883
11 PyFunctional 1,844
12 mara-pipelines 1,670
13 drake 1,310
14 MLJ.jl 1,063
15 galaxy 832
16 go-streams 629
17 pdpipe 605
18 ttyplot 537
19 OK 519
20 dashboard 479
21 Flowex 386
22 targets 369
23 catalog 329