-
FlashLearn
Integrate LLM in any pipeline - fit/predict pattern, JSON driven flows, and built in concurency support.
Yes, LLMs are not always the best option, they are an option. Sometimes requirements of the project are such that they are also the best option.
There is one browser that uses price matching example that is impossible to do without a full-blown data science team right now: https://github.com/Pravko-Solutions/FlashLearn/tree/main/exa...
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
I'm definitely biased because my day job is writing ETL pipelines and supporting software, and my current side project is a data contracts library for helping the above[0]. Still I'm not sure I see this happening.
80% of the focus of an ETL pipeline is in ensuring edge cases are handled appropriately (i.e. not producing models from potentially erroneous data, dead letter queing unknown fields etc).
I think an LLM would be great for "take this json and make it a pandas dataframe", but a lot less great for interact with this billing API to produce auditable payment tables.
For areas that are reliability focused, LLMs still need a lot more improvments to be useful.
[0] https://github.com/benrutter/wimsey
-
OpenRefine
OpenRefine is a free, open source power tool for working with messy data and improving it
Are you aware of this tool? https://openrefine.org
-
For those interested, you can use LLMs to process CSVs in Hal9 and also generate streamlit apps, in addition, the code is open source so if you want to help us improve our RAG or add new tools, you are more than welcomed.
- https://hal9.ai
- https://github.com/hal9ai/hal9