dataops

Top 23 dataops Open-Source Projects

  • flyte

    Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.

  • Project mention: First 15 Open Source Advent projects | dev.to | 2023-12-15

    9. Flyte by Union AI | Github | tutorial

  • console

    Redpanda Console is a developer-friendly UI for managing your Kafka/Redpanda workloads. Console gives you a simple, interactive approach for gaining visibility into your topics, masking data, managing consumer groups, and exploring real-time data with time-travel debugging. (by redpanda-data)

  • Project mention: What are your favorite tools or components in the Kafka ecosystem? | /r/apachekafka | 2023-05-31
  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • lance

    Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, with more integrations coming..

  • Project mention: The Nimble File Format by Meta | news.ycombinator.com | 2024-04-25
  • whylogs

    An open-source data logging library for machine learning models and data pipelines. πŸ“š Provides visibility into data quality & model performance over time. πŸ›‘οΈ Supports privacy-preserving data collection, ensuring safety & robustness. πŸ“ˆ

  • fast-data-dev

    Kafka Docker for development. Kafka, Zookeeper, Schema Registry, Kafka-Connect, Landoop Tools, 20+ connectors

  • Project mention: FLaNK Stack Weekly 16 October 2023 | dev.to | 2023-10-17
  • elementary

    The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.

  • SREWorks

    Cloud Native DataOps & AIOps Platform | δΊ‘εŽŸη”Ÿζ•°ζ™ΊθΏη»΄εΉ³ε°

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • meltano

    Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.

  • Project mention: meltano VS cloudquery - a user suggested alternative | libhunt.com/r/meltano | 2023-06-02
  • sqlmesh

    Efficient data transformation and modeling framework that is backwards compatible with dbt.

  • Project mention: Launch HN: Serra (YC S23) – Open-source, Python-based dbt alternative | news.ycombinator.com | 2023-08-14

    There is also sqlmesh (https://sqlmesh.com/). Pretty new as well. It introduces some interesting concepts. For smaller dbt projects it could be a drop-in replacement as it allows importing dbt projects.

  • optimus

    Optimus is an easy-to-use, reliable, and performant workflow orchestrator for data transformation, data modeling, pipelines, and data quality management. (by raystack)

  • Project mention: Data Engineering Tools in Go | /r/dataengineering | 2023-05-18

    You can check odpf github, they created some dataops tools using go, one of the example is optimus (https://github.com/odpf/optimus) which is a data pipeline orchestrator

  • tenzir

    Open source security data pipelines.

  • Project mention: Vector: A high-performance observability data pipeline | news.ycombinator.com | 2024-03-17

    We're building something similar at Tenzir, but more for operational security workloads. https://docs.tenzir.com

    Differences to Vector:

    - An agent has optional indexed storage, so you can store your data there and pick it up later. The storage is based on Apache Feather, Parquet's little brother.

    - Pipelines operators both work with data frames (Arrow record batches) or chunks of bytes.

    - Structured pipelines are multi-schema, i.e., a single pipeline can process streams of record batches with different schemas.

  • awesome-data-catalogs

    πŸ“™ Awesome Data Catalogs and Observability Platforms.

  • versatile-data-kit

    One framework to develop, deploy and operate data workflows with Python and SQL.

  • Project mention: Looking for a data blogger | /r/opensource | 2023-05-19

    Here's the project: https://github.com/vmware/versatile-data-kit

  • firehose

    Firehose is an extensible, no-code, and cloud-native service to load real-time streaming data from Kafka to data stores, data lakes, and analytical storage systems. (by raystack)

  • pbi-tools

    Power BI DevOps & Source Control Tool

  • squirrel-core

    A Python library that enables ML teams to share, load, and transform data in a collaborative, flexible, and efficient way :chestnut:

  • dagger

    Dagger is an easy-to-use, configuration over code, cloud-native framework built on top of Apache Flink for stateful processing of real-time streaming data. (by raystack)

  • raccoon

    Raccoon is a high-throughput, low-latency service to collect events in real-time from your web, mobile apps, and services using multiple network protocols. (by raystack)

  • meteor

    Meteor is an easy-to-use, plugin-driven metadata collection framework to extract data from different sources and sink to any data catalog. (by raystack)

  • space

    Unified storage framework for the entire machine learning lifecycle (by google)

  • Project mention: Unified storage framework for the entire machine learning lifecycle | news.ycombinator.com | 2024-02-28
  • guardian

    Guardian is universal data access management tool with automated access workflows and security controls across data stores, analytical systems, and cloud products. (by raystack)

  • dim

    πŸ“¦ dim: Manage the open data in your project like a package manager. (by c-3lab)

  • beneath

    Beneath is a serverless real-time data platform ⚑️

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

dataops related posts

  • Launch HN: Serra (YC S23) – Open-source, Python-based dbt alternative

    4 projects | news.ycombinator.com | 14 Aug 2023
  • meltano VS cloudquery - a user suggested alternative

    2 projects | 2 Jun 2023
  • Show HN: Meltano Cloud (Gitlab spinout) – Managed infra for open source ELT

    1 project | news.ycombinator.com | 1 Jun 2023
  • DBT lays off 15% of their staff

    1 project | /r/dataengineering | 19 May 2023
  • SQL Mesh - Auto DAG generation!!

    1 project | /r/HoneyCombAI | 14 May 2023
  • SQL Mesh - Auto DAG generation!!

    1 project | /r/HoneyCombAI | 14 May 2023
  • Data transformation tools other than DBT

    1 project | /r/dataengineering | 14 May 2023
  • A note from our sponsor - SaaSHub
    www.saashub.com | 8 May 2024
    SaaSHub helps you find the best software and product alternatives Learn more β†’

Index

What are some of the best open-source dataops projects? This list will help you:

Project Stars
1 flyte 4,779
2 console 3,605
3 lance 3,275
4 whylogs 2,554
5 fast-data-dev 1,978
6 elementary 1,740
7 SREWorks 1,702
8 meltano 1,601
9 sqlmesh 1,296
10 optimus 737
11 tenzir 612
12 awesome-data-catalogs 586
13 versatile-data-kit 410
14 firehose 312
15 pbi-tools 296
16 squirrel-core 279
17 dagger 254
18 raccoon 187
19 meteor 171
20 space 136
21 guardian 134
22 dim 121
23 beneath 78

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com