Show HN: I built an open-source data pipeline tool in Go

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

CodeRabbit: AI Code Reviews for Developers
Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
coderabbit.ai
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  1. ingestr

    ingestr is a CLI tool to copy data between any databases with a single command seamlessly.

    depends on what you mean by that, but we do use dlt through ingestr (https://github.com/bruin-data/ingestr), which is used inside Bruin CLI.

  2. CodeRabbit

    CodeRabbit: AI Code Reviews for Developers. Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.

    CodeRabbit logo
  3. bruin

    Build data pipelines with SQL and Python, ingest data from different sources, add quality checks, and build end-to-end flows.

  4. sqlframe

    Turning PySpark Into a Universal DataFrame API

    hey, thanks for the shoutout!

    I love the idea, effectively allowing going towards a direction where the right platform for the right job is used, and it is very much in line with where we are taking things towards. Another interesting project in that spirit is sqlframe: https://github.com/eakmanrq/sqlframe

  5. hamilton

    Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.

    I always thought Hamilton [1] does a good job of giving enough visual hooks that draw you in.

    I also noticed this pattern where library authors sometimes do a bit extra in terms of discussing and even promoting their competitors, and it makes me trust them more. A “heres why ours is better and everyone else sucks …” section always comes across as the infomercial character who is having quite a hard time peeling an apple to the point you wonder if this the first time they’ve used hands.

    One thing wish for is a tool that’s essentially just Celery that doesn’t require a message broker (and can just use a database), and which is supported on Windows. There’s always a handful of edge cases where we’re pulling data from an old 32-bit system on Windows. And basically every system has some not-quite-ergonomic workaround that’s as much work as if you’d just built it yourself.

    It seems like it’s just sending a JSON message over a queue or HTTP API and the worker receives it and runs the task. Maybe it’s way harder than I’m envisioning (but I don’t think so because I’ve already written most of it).

    I guess that’s one thing I’m not clear on with Bruin, can I run workers if different physical locations and have them carry out the tasks in the right order? Or is this more of a centralized thing (meaning even if its K8s or Dask or Ray, those are all run in a cluster which happens to be distributed, but they’re all machines sitting in the same subnet, which isn’t the definition of a “distributed task” I’m going for.

    [1] https://github.com/DAGWorks-Inc/hamilton

  6. connect

    Fancy stream processing made operationally mundane (by redpanda-data)

  7. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • I built a data pipeline tool in Go

    3 projects | dev.to | 23 Dec 2024
  • Why do companies still build data ingestion tooling instead of using a third-party tool like Airbyte?

    1 project | /r/dataengineering | 6 Dec 2023
  • Launch HN: PeerDB (YC S23) – Fast, Native ETL/ELT for Postgres

    2 projects | news.ycombinator.com | 27 Jul 2023
  • Design patter for Python ETL

    2 projects | /r/dataengineering | 2 Dec 2022
  • After Airflow. Where next for DE?

    13 projects | /r/dataengineering | 15 Nov 2022

Did you know that Python is
the 2nd most popular programming language
based on number of references?