elt

Open-source projects categorized as elt

Top 23 elt Open-Source Projects

  • Airflow

    Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

    Project mention: Building in Public: Leveraging Tublian's AI Copilot for My Open Source Contributions | dev.to | 2024-02-12

    Contributing to Apache Airflow's open-source project immersed me in collaborative coding. Experienced maintainers rigorously reviewed my contributions, providing constructive feedback. This ongoing dialogue refined the codebase and honed my understanding of best practices.

  • airbyte

    The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.

    Project mention: Launch HN: Bracket (YC W22) – Two-Way Sync Between Salesforce and Postgres | news.ycombinator.com | 2023-12-12

    I'l also give a shout-out to Airbyte (https://airbyte.com/), with which I've had some limited success with integrating Salesforce to a local database. The particular pull for Airbyte is that we can self-host the open source version, rather than pay Fivetran a significant sum to do this for us.

    It's an immature tool, so I don't yet know that I can claim we've spent _less_ than Fivetran on the additional engineering and ops time, but it feels like it has potential to do so once stabilized.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

  • doris

    Apache Doris is an easy-to-use, high performance and unified analytics database.

    Project mention: Variant in Apache Doris 2.1.0: a new data type 8 times faster than JSON for semi-structured data analysis | dev.to | 2024-03-27

    As an open-source real-time data warehouse, Apache Doris provides semi-structured data processing capabilities, and the newly-released version 2.1.0 makes a stride in this direction. Before V2.1, Apache Doris stores semi-structured data as JSON files. However, during query execution, the real-time parsing of JSON data leads to high CPU and I/O consumption in addition to high query latency, especially when the dataset is huge and complicated. Moreover, the lack of a pre-defined schema means there is no handle for query optimization.

  • dbt-core

    dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.

    Project mention: Dbt | news.ycombinator.com | 2024-02-18
  • seatunnel

    SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.

    Project mention: FLaNK Weekly 31 December 2023 | dev.to | 2023-12-31
  • Mage

    🧙 The modern replacement for Airflow. Mage is an open-source data pipeline tool for transforming and integrating data. https://github.com/mage-ai/mage-ai

    Project mention: A mage on the Hero’s Journey: a fantasy epic on how a startup rose from the ashes | dev.to | 2023-06-12

    In the coming years, Mage will create a cooperative experience so that developers can build data pipelines with their team and level up together. After that journey, Mage will go on an epic quest to create the 1st open world community experience in the data universe.

  • kestra

    Infinitely scalable, event-driven, language-agnostic orchestration and scheduling platform to manage millions of workflows declaratively in code.

    Project mention: A High-Performance, Java-Based Orchestration Platform | /r/java | 2023-10-11

    Kestra's communication is asynchronous and based on a queuing mechanism. It leverages the Micronaut framework and offers two runners: one that uses a database (JDBC) for both the message queue and resource storage, and another that uses Kafka as the message queue and Elasticsearch as the resource storage. The platform is fully extensible and plugin-based, providing a rich set of plugins for various workflow tasks, triggers, and data storage options. For those interested, the GitHub repository is available here: https://github.com/kestra-io/kestra

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

  • cloudquery

    The open source high performance data integration platform built for developers.

    Project mention: We might want to regularly keep track of how important each server is | news.ycombinator.com | 2024-02-06

    Check out CloudQuery - https://github.com/cloudquery/cloudquery for an easy cloud asset inventory.

  • dlt

    data load tool (dlt) is an open source Python library that makes data loading easy 🛠️

    Project mention: Ask HN: Freelancer? Seeking freelancer? (December 2023) | news.ycombinator.com | 2023-12-03

    SEEKING FREELANCER | REMOTE | GERMANY

    dltHub is looking for a freelance help in the following repos:

    - https://github.com/dlt-hub/dlt

  • meltano

    Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.

    Project mention: meltano VS cloudquery - a user suggested alternative | libhunt.com/r/meltano | 2023-06-02
  • sqlmesh

    Efficient data transformation and modeling framework that is backwards compatible with dbt.

    Project mention: Launch HN: Serra (YC S23) – Open-source, Python-based dbt alternative | news.ycombinator.com | 2023-08-14

    There is also sqlmesh (https://sqlmesh.com/). Pretty new as well. It introduces some interesting concepts. For smaller dbt projects it could be a drop-in replacement as it allows importing dbt projects.

  • dataform

    Dataform is a framework for managing SQL based data operations in BigQuery

  • kuwala

    Kuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as Airbyte, dbt, or Great Expectations together in one intuitive interface built with React Flow. In addition we provide third-party data into data science models and products with a focus on geospatial data. Currently, the following data connectors are available worldwide: a) High-resolution demograp

    Project mention: Show HN: GeoSage – A ETL Webtool for Geo and Demographics Data from the Open Web | news.ycombinator.com | 2023-10-05

    --> Google Trends Data for Regions (Coming Soon)

    The tool goes beyond our previously published CLI tool (https://github.com/kuwala-io/kuwala/tree/master/kuwala) by providing a hostable solution with a user-friendly interface. We have not open-sourced it yet but a demo is available here: https://geosage.kuwala.io/.

    Urban planners can utilize movement data to analyze foot traffic in different city zones. Marketers can leverage demographic data to tailor campaigns more effectively. Developers can build their apps on top of it.

    To round it up .... GeoSage brings...

    Unified Data Management: Access data from OSM, Facebook, and soon Google, all in one place.

  • optimus

    Optimus is an easy-to-use, reliable, and performant workflow orchestrator for data transformation, data modeling, pipelines, and data quality management. (by raystack)

    Project mention: Data Engineering Tools in Go | /r/dataengineering | 2023-05-18

    You can check odpf github, they created some dataops tools using go, one of the example is optimus (https://github.com/odpf/optimus) which is a data pipeline orchestrator

  • quary

    Transform data together. Model, test and deploy as a team.

    Project mention: Quary: Rust-based data transformation alternative to DBT | news.ycombinator.com | 2024-02-21
  • transfer

    Database replication platform that leverages change data capture. Stream production data from databases to your data warehouse (Snowflake, BigQuery, Redshift) in real-time.

    Project mention: Migrate mongodb Datawarehouse to snowflake | /r/snowflake | 2023-12-04
  • automate-dv

    A free to use dbt package for creating and loading Data Vault 2.0 compliant Data Warehouses (powered by dbt, an open source data engineering tool, registered trademark of dbt Labs)

  • dbt-metabase

    dbt + Metabase integration

  • versatile-data-kit

    One framework to develop, deploy and operate data workflows with Python and SQL.

    Project mention: Looking for a data blogger | /r/opensource | 2023-05-19

    Here's the project: https://github.com/vmware/versatile-data-kit

  • ReplicaDB

    ReplicaDB is open source tool for database replication, designed for efficiently transferring bulk data between relational and non-relational databases

  • astro-sdk

    Astro SDK allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.

    Project mention: Orchestration: Thoughts on Dagster, Airflow and Prefect? | /r/dataengineering | 2023-06-01

    Have you tried the Astro SDK? https://github.com/astronomer/astro-sdk

  • cuelake

    Use SQL to build ELT pipelines on a data lakehouse.

  • sling-cli

    Sling is a CLI tool that extracts data from a source storage/database and loads it in a target storage/database.

    Project mention: FLaNK 04 March 2024 | dev.to | 2024-03-04
  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2024-03-27.

elt related posts

Index

What are some of the best open-source elt projects? This list will help you:

Project Stars
1 Airflow 34,099
2 airbyte 13,646
3 doris 11,115
4 dbt-core 8,718
5 seatunnel 7,154
6 Mage 6,802
7 kestra 5,871
8 cloudquery 5,524
9 dlt 1,599
10 meltano 1,557
11 sqlmesh 1,183
12 dataform 781
13 kuwala 755
14 optimus 736
15 quary 522
16 transfer 517
17 automate-dv 449
18 dbt-metabase 412
19 versatile-data-kit 406
20 ReplicaDB 349
21 astro-sdk 312
22 cuelake 284
23 sling-cli 205
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com