Top 23 dataops Open-Source Projects

flyte

31 4,779 9.8 Go

Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.

Project mention: First 15 Open Source Advent projects | dev.to | 2023-12-15

9. Flyte by Union AI | Github | tutorial

console

4 3,605 9.7 Go

Redpanda Console is a developer-friendly UI for managing your Kafka/Redpanda workloads. Console gives you a simple, interactive approach for gaining visibility into your topics, masking data, managing consumer groups, and exploring real-time data with time-travel debugging. (by redpanda-data)

Project mention: What are your favorite tools or components in the Kafka ecosystem? | /r/apachekafka | 2023-05-31

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
lance

10 3,275 9.8 Rust

Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, with more integrations coming..

Project mention: The Nimble File Format by Meta | news.ycombinator.com | 2024-04-25

whylogs

6 2,554 9.0 Jupyter Notebook

An open-source data logging library for machine learning models and data pipelines. 📚 Provides visibility into data quality & model performance over time. 🛡️ Supports privacy-preserving data collection, ensuring safety & robustness. 📈
fast-data-dev

10 1,978 5.2 Shell

Kafka Docker for development. Kafka, Zookeeper, Schema Registry, Kafka-Connect, Landoop Tools, 20+ connectors

Project mention: FLaNK Stack Weekly 16 October 2023 | dev.to | 2023-10-17

elementary

30 1,740 9.8 HTML

The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
SREWorks

2 1,702 6.6 Java

Cloud Native DataOps & AIOps Platform | 云原生数智运维平台
SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
meltano

9 1,601 9.8 Python

Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.

Project mention: meltano VS cloudquery - a user suggested alternative | libhunt.com/r/meltano | 2023-06-02

sqlmesh

12 1,296 9.9 Python

Efficient data transformation and modeling framework that is backwards compatible with dbt.

Project mention: Launch HN: Serra (YC S23) – Open-source, Python-based dbt alternative | news.ycombinator.com | 2023-08-14

There is also sqlmesh (https://sqlmesh.com/). Pretty new as well. It introduces some interesting concepts. For smaller dbt projects it could be a drop-in replacement as it allows importing dbt projects.

optimus

5 737 4.4 Go

Optimus is an easy-to-use, reliable, and performant workflow orchestrator for data transformation, data modeling, pipelines, and data quality management. (by raystack)

Project mention: Data Engineering Tools in Go | /r/dataengineering | 2023-05-18

You can check odpf github, they created some dataops tools using go, one of the example is optimus (https://github.com/odpf/optimus) which is a data pipeline orchestrator

tenzir

15 612 10.0 C++

Open source security data pipelines.

Project mention: Vector: A high-performance observability data pipeline | news.ycombinator.com | 2024-03-17

We're building something similar at Tenzir, but more for operational security workloads. https://docs.tenzir.com
Differences to Vector:
- An agent has optional indexed storage, so you can store your data there and pick it up later. The storage is based on Apache Feather, Parquet's little brother.
- Pipelines operators both work with data frames (Arrow record batches) or chunks of bytes.
- Structured pipelines are multi-schema, i.e., a single pipeline can process streams of record batches with different schemas.

awesome-data-catalogs

9 586 4.2

📙 Awesome Data Catalogs and Observability Platforms.
versatile-data-kit

52 410 9.7 Python

One framework to develop, deploy and operate data workflows with Python and SQL.

Project mention: Looking for a data blogger | /r/opensource | 2023-05-19

Here's the project: https://github.com/vmware/versatile-data-kit

firehose

3 312 2.5 Java

Firehose is an extensible, no-code, and cloud-native service to load real-time streaming data from Kafka to data stores, data lakes, and analytical storage systems. (by raystack)
pbi-tools

6 296 7.3 C#

Power BI DevOps & Source Control Tool
squirrel-core

1 279 5.6 Python

A Python library that enables ML teams to share, load, and transform data in a collaborative, flexible, and efficient way :chestnut:
dagger

1 254 2.7 Java

Dagger is an easy-to-use, configuration over code, cloud-native framework built on top of Apache Flink for stateful processing of real-time streaming data. (by raystack)
raccoon

1 187 4.0 Go

Raccoon is a high-throughput, low-latency service to collect events in real-time from your web, mobile apps, and services using multiple network protocols. (by raystack)
meteor

1 171 6.7 Go

Meteor is an easy-to-use, plugin-driven metadata collection framework to extract data from different sources and sink to any data catalog. (by raystack)
space

1 136 8.9 Python

Unified storage framework for the entire machine learning lifecycle (by google)

Project mention: Unified storage framework for the entire machine learning lifecycle | news.ycombinator.com | 2024-02-28

guardian

1 134 7.5 Go

Guardian is universal data access management tool with automated access workflows and security controls across data stores, analytical systems, and cloud products. (by raystack)
dim

2 121 7.3 TypeScript

📦 dim: Manage the open data in your project like a package manager. (by c-3lab)
beneath

2 78 0.0 Go

Beneath is a serverless real-time data platform ⚡️
SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

dataops related posts

Launch HN: Serra (YC S23) – Open-source, Python-based dbt alternative

4 projects | news.ycombinator.com | 14 Aug 2023
meltano VS cloudquery - a user suggested alternative

2 projects | 2 Jun 2023
Show HN: Meltano Cloud (Gitlab spinout) – Managed infra for open source ELT

1 project | news.ycombinator.com | 1 Jun 2023
DBT lays off 15% of their staff

1 project | /r/dataengineering | 19 May 2023
SQL Mesh - Auto DAG generation!!

1 project | /r/HoneyCombAI | 14 May 2023
SQL Mesh - Auto DAG generation!!

1 project | /r/HoneyCombAI | 14 May 2023
Data transformation tools other than DBT

1 project | /r/dataengineering | 14 May 2023
A note from our sponsor - SaaSHub
www.saashub.com | 8 May 2024

SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source dataops projects? This list will help you:

	Project	Stars
1	flyte	4,779
2	console	3,605
3	lance	3,275
4	whylogs	2,554
5	fast-data-dev	1,978
6	elementary	1,740
7	SREWorks	1,702
8	meltano	1,601
9	sqlmesh	1,296
10	optimus	737
11	tenzir	612
12	awesome-data-catalogs	586
13	versatile-data-kit	410
14	firehose	312
15	pbi-tools	296
16	squirrel-core	279
17	dagger	254
18	raccoon	187
19	meteor	171
20	space	136
21	guardian	134
22	dim	121
23	beneath	78

dataops

Top 23 dataops Open-Source Projects

dataops related posts

Launch HN: Serra (YC S23) – Open-source, Python-based dbt alternative

meltano VS cloudquery - a user suggested alternative

Show HN: Meltano Cloud (Gitlab spinout) – Managed infra for open source ELT

DBT lays off 15% of their staff

SQL Mesh - Auto DAG generation!!

SQL Mesh - Auto DAG generation!!

Data transformation tools other than DBT

Index