SaaSHub helps you find the best software and product alternatives Learn more β
Top 23 dataops Open-Source Projects
-
flyte
Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.
-
console
Redpanda Console is a developer-friendly UI for managing your Kafka/Redpanda workloads. Console gives you a simple, interactive approach for gaining visibility into your topics, masking data, managing consumer groups, and exploring real-time data with time-travel debugging. (by redpanda-data)
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
lance
Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, with more integrations coming..
-
whylogs
An open-source data logging library for machine learning models and data pipelines. π Provides visibility into data quality & model performance over time. π‘οΈ Supports privacy-preserving data collection, ensuring safety & robustness. π
-
fast-data-dev
Kafka Docker for development. Kafka, Zookeeper, Schema Registry, Kafka-Connect, Landoop Tools, 20+ connectors
-
elementary
The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
meltano
Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.
-
optimus
Optimus is an easy-to-use, reliable, and performant workflow orchestrator for data transformation, data modeling, pipelines, and data quality management. (by raystack)
-
firehose
Firehose is an extensible, no-code, and cloud-native service to load real-time streaming data from Kafka to data stores, data lakes, and analytical storage systems. (by raystack)
-
squirrel-core
A Python library that enables ML teams to share, load, and transform data in a collaborative, flexible, and efficient way :chestnut:
-
dagger
Dagger is an easy-to-use, configuration over code, cloud-native framework built on top of Apache Flink for stateful processing of real-time streaming data. (by raystack)
-
raccoon
Raccoon is a high-throughput, low-latency service to collect events in real-time from your web, mobile apps, and services using multiple network protocols. (by raystack)
-
meteor
Meteor is an easy-to-use, plugin-driven metadata collection framework to extract data from different sources and sink to any data catalog. (by raystack)
-
guardian
Guardian is universal data access management tool with automated access workflows and security controls across data stores, analytical systems, and cloud products. (by raystack)
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
9. Flyte by Union AI | Github | tutorial
Project mention: What are your favorite tools or components in the Kafka ecosystem? | /r/apachekafka | 2023-05-31
Project mention: meltano VS cloudquery - a user suggested alternative | libhunt.com/r/meltano | 2023-06-02
Project mention: Launch HN: Serra (YC S23) β Open-source, Python-based dbt alternative | news.ycombinator.com | 2023-08-14There is also sqlmesh (https://sqlmesh.com/). Pretty new as well. It introduces some interesting concepts. For smaller dbt projects it could be a drop-in replacement as it allows importing dbt projects.
You can check odpf github, they created some dataops tools using go, one of the example is optimus (https://github.com/odpf/optimus) which is a data pipeline orchestrator
Project mention: Vector: A high-performance observability data pipeline | news.ycombinator.com | 2024-03-17We're building something similar at Tenzir, but more for operational security workloads. https://docs.tenzir.com
Differences to Vector:
- An agent has optional indexed storage, so you can store your data there and pick it up later. The storage is based on Apache Feather, Parquet's little brother.
- Pipelines operators both work with data frames (Arrow record batches) or chunks of bytes.
- Structured pipelines are multi-schema, i.e., a single pipeline can process streams of record batches with different schemas.
Here's the project: https://github.com/vmware/versatile-data-kit
Project mention: Unified storage framework for the entire machine learning lifecycle | news.ycombinator.com | 2024-02-28
dataops related posts
-
Launch HN: Serra (YC S23) β Open-source, Python-based dbt alternative
-
meltano VS cloudquery - a user suggested alternative
2 projects | 2 Jun 2023 -
Show HN: Meltano Cloud (Gitlab spinout) β Managed infra for open source ELT
-
DBT lays off 15% of their staff
-
SQL Mesh - Auto DAG generation!!
-
SQL Mesh - Auto DAG generation!!
-
Data transformation tools other than DBT
-
A note from our sponsor - SaaSHub
www.saashub.com | 8 May 2024
Index
What are some of the best open-source dataops projects? This list will help you:
Project | Stars | |
---|---|---|
1 | flyte | 4,779 |
2 | console | 3,605 |
3 | lance | 3,275 |
4 | whylogs | 2,554 |
5 | fast-data-dev | 1,978 |
6 | elementary | 1,740 |
7 | SREWorks | 1,702 |
8 | meltano | 1,601 |
9 | sqlmesh | 1,296 |
10 | optimus | 737 |
11 | tenzir | 612 |
12 | awesome-data-catalogs | 586 |
13 | versatile-data-kit | 410 |
14 | firehose | 312 |
15 | pbi-tools | 296 |
16 | squirrel-core | 279 |
17 | dagger | 254 |
18 | raccoon | 187 |
19 | meteor | 171 |
20 | space | 136 |
21 | guardian | 134 |
22 | dim | 121 |
23 | beneath | 78 |
Sponsored