wimsey
Scrapling
wimsey | Scrapling | |
---|---|---|
4 | 2 | |
127 | 2,893 | |
2.4% | 7.3% | |
7.3 | 9.7 | |
7 days ago | 5 days ago | |
Python | Python | |
MIT License | BSD 3-clause "New" or "Revised" License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
wimsey
-
Classic Data science pipelines built with LLMs
I'm definitely biased because my day job is writing ETL pipelines and supporting software, and my current side project is a data contracts library for helping the above[0]. Still I'm not sure I see this happening.
80% of the focus of an ETL pipeline is in ensuring edge cases are handled appropriately (i.e. not producing models from potentially erroneous data, dead letter queing unknown fields etc).
I think an LLM would be great for "take this json and make it a pandas dataframe", but a lot less great for interact with this billing API to produce auditable payment tables.
For areas that are reliability focused, LLMs still need a lot more improvments to be useful.
[0] https://github.com/benrutter/wimsey
-
The Data Engineering Handbook
Nice list! Although as somebody who works on open source tools for data engineering, it kills me a little to see "companies" as the the list header rather than, say, "projects".
(also, shameless plug for my.latest project Wimsey which is non-company affiliated but does let you test data in a nice, lightweight way: https://github.com/benrutter/wimsey)
- Wimsey: A flexible, lightweight data contracts library
-
This Week In Python
wimsey – Easy and flexible data testing and documentation
Scrapling
-
This Week In Python
Scrapling – A simple web scraping tool for Python
- Scrapling: Fast, Adaptive Web Scraping for Python
What are some alternatives?
finstruments - Financial instrument definitions built with Python and Pydantic
abacus-minimal - A minimal event-based ledger in Python that follows accounting rules
Music
procyclingstats - procyclingstats scraper