hamilton
datahub
Our great sponsors
hamilton | datahub | |
---|---|---|
19 | 34 | |
1,312 | 9,197 | |
8.2% | 2.2% | |
9.8 | 9.9 | |
3 days ago | 6 days ago | |
Jupyter Notebook | Java | |
BSD 3-clause Clear License | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
hamilton
-
Using IPython Jupyter Magic commands to improve the notebook experience
In this post, we’ll show how your team can turn any utility function(s) into reusable IPython Jupyter magics for a better notebook experience. As an example, we’ll use Hamilton, my open source library, to motivate the creation of a magic that facilitates better development ergonomics for using it. You needn’t know what Hamilton is to understand this post.
-
FastUI: Build Better UIs Faster
We built an app with it -- https://blog.dagworks.io/p/building-a-lightweight-experiment. You can see the code here https://github.com/DAGWorks-Inc/hamilton/blob/main/hamilton/....
Usually we've been prototyping with streamlit, but found that at times to be clunky. FastUI still has rough edges, but we made it work for our lightweight app.
- Show HN: On Garbage Collection and Memory Optimization in Hamilton
-
Facebook Prophet: library for generating forecasts from any time series data
This library is old news? Is there anything new that they've added that's noteworthy to take it for another spin?
[disclaimer I'm a maintainer of Hamilton] Otherwise FYI Prophet gels well with https://github.com/DAGWorks-Inc/hamilton for setting up your features and dataset for fitting & prediction[/disclaimer].
- Show HN: Declarative Spark Transformations with Hamilton
-
Langchain Is Pointless
I had been hearing these pains from Langchain users for quite a while. Suffice to say I think:
1. too many layers of OO abstractions are a liability in production contexts. I'm biased, but a more functional approach is a better way to model what's going on. It's easier to test, wrap a function with concerns, and therefore reason about.
2. as fast as the field is moving, the layers of abstractions actually hurt your ability to customize without really diving into the details of the framework, or requiring you to step outside it -- in which case, why use it?
Otherwise I definitely love the small amount of code you need to write to get an LLM application up with Langchain. However you read code more often than you write it, in which case this brevity is a trade-off. Would you prefer to reduce your time debugging a production outage? or building the application? There's no right answer, other than "it depends".
To that end - we've come up with a post showing how one might use Hamilton (https://github.com/dagWorks-Inc/hamilton) to easily create a workflow to ingest data into a vector database that I think has a great production story. https://open.substack.com/pub/dagworks/p/building-a-maintain...
Note: Hamilton can cover your MLOps as well as LLMOps needs; you'll invariably be connecting LLM applications with traditional data/ML pipelines because LLMs don't solve everything -- but that's a post for another day.
-
Free access to beta product I'm building that I'd love feedback on
This is me. I drive an open source library Hamilton that people doing time-series/ML work love to use. I'm building a paid product around it at DAGWorks, and I'm after feedback on our current version. Can I entice anyone to:
-
IPyflow: Reactive Python Notebooks in Jupyter(Lab)
From a nuts and bolts perspective, I've been thinking of building some reactivity on top of https://github.com/dagworks-inc/hamilton (author here) that could get at this. (If you have a use case that could be documented, I'd appreciate it.)
-
Data lineage
Most people don't track lineage because it's difficult (though if you use something like https://github.com/DAGWorks-Inc/hamilton to write your pipeline - author here - it can come almost for free).
-
Needs advice for choosing tools for my team. We use AWS.
Otherwise, I'm biased here, but check out https://github.com/dagworks-inc/hamilton - it could be your universal layer that expresses how things should flow, that is orchestration system agnostic, which would make it easy to migrate between systems easily.
datahub
-
Ask HN: Looking for DB schema management tool
Sounds like you are looking for a data catalog tool instead of db schema management tool. You can check out Amundsen (https://www.amundsen.io/), DataHub (https://datahubproject.io/)
If you are looking for schema change management tool, then you can check out Bytebase (bytebase.com). But it can't answer questions like "which collections contain links to bigmongo.user.id?"
-
Which open source or commercial tools are used for Data Governance and access management
IIUC DataHub (open source project out of LinkedIn) might be relevant here
- ODD Platform - An open-source data discovery and observability service - v0.12 release
-
What data governance tool are you folks using?
I’m a huge fan of DataHub, the open source data catalogue spun out of LinkedIn, but it’s best thought of as an observability layer for data assets that can be shared by data engineers and analyst-types. For data users: it’s a stellar search/discovery interface (what datasets are there on this keyword, which are most broadly used across the organization, what downstream products are made with this data, what’s it usually joined to, are it’s upstream pipelines reliable). For data engineers, it’s a comprehensive asset cataloger, crawling your warehouse, orchestrator, modeling layers, features, and reports, matching the lineage into a graph where it can.
- Our data catalog is difficult to manage and not built for the wider org - what can we do?
-
What's the best way to build documentation for a data infrastructure? any existing tools
If you are looking for a data cataloguing solution, look at Datahub. Haven't used it, but heard good things about it.
-
Looking for an "offline" data discovery platform
What I am looking for is a solution (similar to Amundsen or [Datahub](https://datahubproject.io/)) that also allows to add tables and their metadata manually.
-
Looking for an open-source data lineage app, where objects and connections can be manually defined (not just automatically ingested)
Hello everyone, I'm looking for an open-source data lineage app (e.g. tokern, datahubproject, openmetadata).
-
How do you document your dashboards?
What about DataHub? Haven't really used it but I'm actively reading about it and about to use it for some light documentation for some small pipelines.
- Any reason why I shouldn't give my dbt docs to everyone?
What are some alternatives?
dagster - An orchestration platform for the development, production, and observation of data assets.
OpenMetadata - Open Standard for Metadata. A Single place to Discover, Collaborate and Get your data right.
tree-of-thought-llm - [NeurIPS 2023] Tree of Thoughts: Deliberate Problem Solving with Large Language Models
amundsen - Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.
haystack - :mag: LLM orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
OpenLineage - An Open Standard for lineage metadata collection
snowpark-python - Snowflake Snowpark Python API
atlas - Manage your database schema as code
aipl - Array-Inspired Pipeline Language
metacat
vscode-reactive-jupyter - A simple Reactive Python Extension for Visual Studio Code
Atlas - 🚀 An open and lightweight modification to Windows, designed to optimize performance, privacy and security.