FlashLearn
wimsey
FlashLearn | wimsey | |
---|---|---|
17 | 4 | |
575 | 124 | |
98.4% | 21.8% | |
9.4 | 7.5 | |
8 days ago | about 1 month ago | |
Python | Python | |
MIT License | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
FlashLearn
- Competitor price matching with browser agents
- Classic machine learning built with LLMs
- Improve RAG quality with Anthropic's contextual retrival implementation
- Parse financial report data into JSON with LLMs
-
Classic Data science pipelines built with LLMs
Yes, LLMs are not always the best option, they are an option. Sometimes requirements of the project are such that they are also the best option.
There is one browser that uses price matching example that is impossible to do without a full-blown data science team right now: https://github.com/Pravko-Solutions/FlashLearn/tree/main/exa...
- Show HN: LangChain, but for Software Engineers
- Build AI agent for PR reviews – Full code
- Enterprise AI agents examples with full code
- Minimal browser AI agent example you will understand
- Show HN: I made AI agent lib that you will understand
wimsey
-
Classic Data science pipelines built with LLMs
I'm definitely biased because my day job is writing ETL pipelines and supporting software, and my current side project is a data contracts library for helping the above[0]. Still I'm not sure I see this happening.
80% of the focus of an ETL pipeline is in ensuring edge cases are handled appropriately (i.e. not producing models from potentially erroneous data, dead letter queing unknown fields etc).
I think an LLM would be great for "take this json and make it a pandas dataframe", but a lot less great for interact with this billing API to produce auditable payment tables.
For areas that are reliability focused, LLMs still need a lot more improvments to be useful.
[0] https://github.com/benrutter/wimsey
-
The Data Engineering Handbook
Nice list! Although as somebody who works on open source tools for data engineering, it kills me a little to see "companies" as the the list header rather than, say, "projects".
(also, shameless plug for my.latest project Wimsey which is non-company affiliated but does let you test data in a nice, lightweight way: https://github.com/benrutter/wimsey)
- Wimsey: A flexible, lightweight data contracts library
-
This Week In Python
wimsey – Easy and flexible data testing and documentation
What are some alternatives?
OpenRefine - OpenRefine is a free, open source power tool for working with messy data and improving it
finstruments - Financial instrument definitions built with Python and Pydantic
Scrapling - 🕷️ An undetectable, powerful, flexible, high-performance Python library that makes Web Scraping easy again!
abacus-minimal - A minimal event-based ledger in Python that follows accounting rules
data-engineer-handbook - This is a repo with links to everything you'd ever want to learn about data engineering
Music