-
FlashLearn
Integrate LLM in any pipeline - fit/predict pattern, JSON driven flows, and built in concurency support.
Yes, LLMs are not always the best option, they are an option. Sometimes requirements of the project are such that they are also the best option.
There is one browser that uses price matching example that is impossible to do without a full-blown data science team right now: https://github.com/Pravko-Solutions/FlashLearn/tree/main/exa...
-
CodeRabbit
CodeRabbit: AI Code Reviews for Developers. Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
-
I'm definitely biased because my day job is writing ETL pipelines and supporting software, and my current side project is a data contracts library for helping the above[0]. Still I'm not sure I see this happening.
80% of the focus of an ETL pipeline is in ensuring edge cases are handled appropriately (i.e. not producing models from potentially erroneous data, dead letter queing unknown fields etc).
I think an LLM would be great for "take this json and make it a pandas dataframe", but a lot less great for interact with this billing API to produce auditable payment tables.
For areas that are reliability focused, LLMs still need a lot more improvments to be useful.
[0] https://github.com/benrutter/wimsey
-
OpenRefine
OpenRefine is a free, open source power tool for working with messy data and improving it
Are you aware of this tool? https://openrefine.org
-
For those interested, you can use LLMs to process CSVs in Hal9 and also generate streamlit apps, in addition, the code is open source so if you want to help us improve our RAG or add new tools, you are more than welcomed.
- https://hal9.ai
- https://github.com/hal9ai/hal9