etl-framework

Top 15 etl-framework Open-Source Projects

  • Logstash

    Logstash - transport and process your logs, events, or other data

  • cloudquery

    The open source high performance ELT framework powered by Apache Arrow

  • Project mention: We might want to regularly keep track of how important each server is | news.ycombinator.com | 2024-02-06

    Check out CloudQuery - https://github.com/cloudquery/cloudquery for an easy cloud asset inventory.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • hamilton

    Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage and metadata. Runs and scales everywhere python does.

  • Project mention: Show HN: Hamilton's UI – observability, lineage, and catalog for data pipelines | news.ycombinator.com | 2024-05-02
  • getting-started

    This repository is a getting started guide to Singer. (by singer-io)

  • Project mention: Why do companies still build data ingestion tooling instead of using a third-party tool like Airbyte? | /r/dataengineering | 2023-12-06

    Coincidently, I saw a presentation today on a nice half-way-house solution: using embeddable Python libraries like Sling and dlt - both open-source. See https://www.youtube.com/watch?v=gAqOLgG2iYY There is also singer.io which is more of a protocol than a library, but can also be installed although it looks like it is a true community effort and not so well maintained.

  • quokka

    Making data lake work for time series (by marsupialtail)

  • Project mention: How Query Engines Work | news.ycombinator.com | 2023-09-08

    An awesome read!

    Something related that I found out about from HN a few months back is another engine called quokka. It's particularly interesting and applicable how quokka schedules distributed queries to outperform Spark https://github.com/marsupialtail/quokka/blob/master/blog/why...

  • Cinchoo ETL

    ETL framework for .NET (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)

  • metorikku

    A simplified, lightweight ETL Framework based on Apache Spark

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • flow

    Flow PHP - data processing framework (by flow-php)

  • Project mention: Flow PHP: the first and most advanced PHP ETL framework | news.ycombinator.com | 2024-04-16
  • kgtk

    Knowledge Graph Toolkit

  • dataall

    A modern data marketplace that makes collaboration among diverse users (like business, analysts and engineers) easier, increasing efficiency and agility in data projects on AWS.

  • patterns-devkit

    Data pipelines from re-usable components

  • csvplus

    csvplus extends the standard Go encoding/csv package with fluent interface, lazy stream operations, indices and joins.

  • Shift

    Shift is a high performance better alternative to Airbyte, Singer, Meltano (by piyushsingariya)

  • Project mention: Alternative to Airbyte, Singer and Meltano | /r/dataengineering | 2023-08-11

    As side hobby I started working on this personal project https://github.com/piyushsingariya/Kaku

  • flowrunner

    Flowrunner is a lightweight package to organize and represent Data Engineering/Science workflows

  • DataPowerTools

    Bridging the gap between IEnumerable and IDataReader for dealing with unstructured and loosely-structured data, plus fast ETL + SQL Bulk Copy.

  • Project mention: Recommended patterns or tools for data/row migration between databases? | /r/dotnet | 2023-06-22
  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

etl-framework related posts

  • Flow PHP: the first and most advanced PHP ETL framework

    1 project | news.ycombinator.com | 16 Apr 2024
  • FLaNK Weekly 31 December 2023

    25 projects | dev.to | 31 Dec 2023
  • Why do companies still build data ingestion tooling instead of using a third-party tool like Airbyte?

    1 project | /r/dataengineering | 6 Dec 2023
  • SymmetricDS: Open-Source, cross platform database replication software

    3 projects | news.ycombinator.com | 6 Aug 2023
  • Breakthrough in the book search field! Use Apache SeaTunnel to improve the efficiency of book title similarity search

    3 projects | dev.to | 3 Jul 2023
  • Quokka – Distributed Polars on Ray

    1 project | news.ycombinator.com | 30 Jun 2023
  • Questions Regarding design DW

    1 project | /r/dataengineering | 24 Jun 2023
  • A note from our sponsor - InfluxDB
    www.influxdata.com | 9 May 2024
    Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →

Index

What are some of the best open-source etl-framework projects? This list will help you:

Project Stars
1 Logstash 14,014
2 cloudquery 5,591
3 hamilton 1,373
4 getting-started 1,220
5 quokka 1,084
6 Cinchoo ETL 738
7 metorikku 576
8 flow 352
9 kgtk 341
10 dataall 210
11 patterns-devkit 106
12 csvplus 66
13 Shift 9
14 flowrunner 8
15 DataPowerTools 8

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com