Top 11 Python streaming-data Projects
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
scikit-multiflow
A machine learning package for streaming data in Python. The other ancestor of River.
-
materialize-tutorials
Materialize is a streaming database for real-time analytics. This is a collection of Materialize demos and tutorials.
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
River is a Python library for online machine learning. Online machine learning can dynamically adapt to new patterns in the data, or when the data itself is generated as a function of time, e.g., stock price prediction, content personalization.
Project mention: Building a streaming SQL engine with Arrow and DataFusion | news.ycombinator.com | 2024-03-18
River is actually the merger between creme and scikit-multiflow, another great example of open source collaboration and continuation.
Project mention: Ask HN: What Python libraries do you wish more people knew about? | news.ycombinator.com | 2023-12-03
Project mention: Show HN: Geniusrise, a framework and ecosystem for AI agents | news.ycombinator.com | 2023-09-23## More Links
1. https://github.com/geniusrise/geniusrise - core framework
2. https://github.com/geniusrise/geniusrise-huggingface - hf modules
3. https://github.com/geniusrise/geniusrise-openai - openai modules
4. https://github.com/geniusrise/geniusrise-listeners - streaming data input
5. https://github.com/geniusrise/geniusrise-databases - database input
6. https://github.com/geniusrise/geniusrise-prompt-actions - functional integrations (RAG-able and GPT function call-able, WIP)
7. https://github.com/geniusrise/geniusrise-indexing - vectorizing for RAG usecases (WIP)
8. https://github.com/geniusrise/geniusrise-exit-proxy - cached LLM interface with MITM-auditing (WIP)
## Asides
I think the core framework can be AGPL but the modules must be MIT / Apachev2.
I really wanted to create an elaborate example in the guides but could not find time, - something like load and vectorize SNOMED-CT or UMLS and use it to NER / RAG EHR docs. Or maybe a usecase of doctor communicating to patient in another language (a major problem in India), with reverse translation verifying translated output using the KG. These kinds of stuff are soon to come. Or discourse segmentation for better chunking for RAG usecases.
I'm not sure if I should add cyberpunk-ed scientists as banner images. I tried with mathematicians like Voevodsky to Andre Joyal to John Baez, but couldn't. Actual geniuses tend to not be famous, hence SDXL fails I guess.
I plan to also write this framework in scala. The category-theorizing of neural networks is amazing!!! https://github.com/bgavran/Category_Theory_Machine_Learning. I hope Bartosz Milewski approves.
I love Alan Turing, but cuz of "The Chemical Basis of Morphogenesis". It introduced me to the wonderful world of complex systems. Hence, his image as banner.
I'm also working on a cli library called "isomorphic", wraps over argparse and provides cli, api, yaml, json interfaces.
Yes, gradio integration is also underway.
Finally, to huggingface.
Python streaming-data related posts
Index
What are some of the best open-source streaming-data projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | river | 4,766 |
2 | smart_open | 3,091 |
3 | Streamz | 1,217 |
4 | bytewax | 1,144 |
5 | scikit-multiflow | 739 |
6 | tractor | 249 |
7 | materialize-tutorials | 82 |
8 | makinage | 38 |
9 | cinje | 31 |
10 | rxsci | 13 |
11 | geniusrise-listeners | 1 |
Sponsored