Top 3 Jupyter Notebook ETL Projects
-
hamilton
Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage and metadata. Runs and scales everywhere python does.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
Note that this uses simple OpenAI calls — you can replace this with Langchain, LlamaIndex, Hamilton (or something else) if you prefer more abstraction, and delegate to whatever LLM you like to use. And, you should probably use something a little more concrete (E.G. instructor) to guarantee output shape.
You can find the code related to this project in my GitHub repository.
Index
What are some of the best open-source ETL projects in Jupyter Notebook? This list will help you:
Project | Stars | |
---|---|---|
1 | hamilton | 1,312 |
2 | ghcn-d | 21 |
3 | udacity_bike_share_datalake_project | 0 |
Sponsored