Classic Data science pipelines built with LLMs

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  1. FlashLearn

    Integrate LLM in any pipeline - fit/predict pattern, JSON driven flows, and built in concurency support.

    Yes, LLMs are not always the best option, they are an option. Sometimes requirements of the project are such that they are also the best option.

    There is one browser that uses price matching example that is impossible to do without a full-blown data science team right now: https://github.com/Pravko-Solutions/FlashLearn/tree/main/exa...

  2. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  3. wimsey

    Easy and flexible data contracts

    I'm definitely biased because my day job is writing ETL pipelines and supporting software, and my current side project is a data contracts library for helping the above[0]. Still I'm not sure I see this happening.

    80% of the focus of an ETL pipeline is in ensuring edge cases are handled appropriately (i.e. not producing models from potentially erroneous data, dead letter queing unknown fields etc).

    I think an LLM would be great for "take this json and make it a pandas dataframe", but a lot less great for interact with this billing API to produce auditable payment tables.

    For areas that are reliability focused, LLMs still need a lot more improvments to be useful.

    [0] https://github.com/benrutter/wimsey

  4. OpenRefine

    OpenRefine is a free, open source power tool for working with messy data and improving it

    Are you aware of this tool? https://openrefine.org

  5. hal9

    Hal9 — Create and Share Generative Apps

    For those interested, you can use LLMs to process CSVs in Hal9 and also generate streamlit apps, in addition, the code is open source so if you want to help us improve our RAG or add new tools, you are more than welcomed.

    - https://hal9.ai

    - https://github.com/hal9ai/hal9

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Securing Test Environments: Preventing PII Leakage with Open Source Linux Tools

    2 projects | dev.to | 4 Feb 2026
  • Working on the Open Data Editor at the Open Knowledge Foundation

    2 projects | dev.to | 17 Apr 2025
  • OpenRefine: For working with, cleaning, transforming messy data

    1 project | news.ycombinator.com | 9 Feb 2025
  • What you need to know about the future of Mozilla Hubs

    1 project | news.ycombinator.com | 15 Feb 2024
  • OpenRefine

    1 project | /r/patient_hackernews | 23 Oct 2023