Classic Data science pipelines built with LLMs

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

CodeRabbit: AI Code Reviews for Developers
Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
coderabbit.ai
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  1. FlashLearn

    Integrate LLM in any pipeline - fit/predict pattern, JSON driven flows, and built in concurency support.

    Yes, LLMs are not always the best option, they are an option. Sometimes requirements of the project are such that they are also the best option.

    There is one browser that uses price matching example that is impossible to do without a full-blown data science team right now: https://github.com/Pravko-Solutions/FlashLearn/tree/main/exa...

  2. CodeRabbit

    CodeRabbit: AI Code Reviews for Developers. Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.

    CodeRabbit logo
  3. wimsey

    Easy and flexible data contracts

    I'm definitely biased because my day job is writing ETL pipelines and supporting software, and my current side project is a data contracts library for helping the above[0]. Still I'm not sure I see this happening.

    80% of the focus of an ETL pipeline is in ensuring edge cases are handled appropriately (i.e. not producing models from potentially erroneous data, dead letter queing unknown fields etc).

    I think an LLM would be great for "take this json and make it a pandas dataframe", but a lot less great for interact with this billing API to produce auditable payment tables.

    For areas that are reliability focused, LLMs still need a lot more improvments to be useful.

    [0] https://github.com/benrutter/wimsey

  4. OpenRefine

    OpenRefine is a free, open source power tool for working with messy data and improving it

    Are you aware of this tool? https://openrefine.org

  5. hal9

    Hal9 — Create and Share Generative Apps

    For those interested, you can use LLMs to process CSVs in Hal9 and also generate streamlit apps, in addition, the code is open source so if you want to help us improve our RAG or add new tools, you are more than welcomed.

    - https://hal9.ai

    - https://github.com/hal9ai/hal9

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

Did you know that Python is
the 2nd most popular programming language
based on number of references?