Python streaming-data

Open-source Python projects categorized as streaming-data

Top 11 Python streaming-data Projects

  • river

    🌊 Online machine learning in Python

  • Project mention: 🔍Underrated Open Source Projects You Should Know About 🧠 | dev.to | 2024-03-20

    River is a Python library for online machine learning. Online machine learning can dynamically adapt to new patterns in the data, or when the data itself is generated as a function of time, e.g., stock price prediction, content personalization.

  • smart_open

    Utils for streaming large files (S3, HDFS, gzip, bz2...)

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • Streamz

    Real-time stream processing for python

  • bytewax

    Python Stream Processing

  • Project mention: Building a streaming SQL engine with Arrow and DataFusion | news.ycombinator.com | 2024-03-18
  • scikit-multiflow

    A machine learning package for streaming data in Python. The other ancestor of River.

  • Project mention: 🔍Underrated Open Source Projects You Should Know About 🧠 | dev.to | 2024-03-20

    River is actually the merger between creme and scikit-multiflow, another great example of open source collaboration and continuation.

  • tractor

    A distributed, structured concurrent runtime for Python (and friends)

  • Project mention: Ask HN: What Python libraries do you wish more people knew about? | news.ycombinator.com | 2023-12-03
  • materialize-tutorials

    Materialize is a streaming database for real-time analytics. This is a collection of Materialize demos and tutorials.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • makinage

    Stream Processing Made Easy

  • cinje

    A Pythonic and ultra fast template engine DSL.

  • rxsci

    ReactiveX for data science

  • geniusrise-listeners

    A collection of Spouts that listen to events

  • Project mention: Show HN: Geniusrise, a framework and ecosystem for AI agents | news.ycombinator.com | 2023-09-23

    ## More Links

    1. https://github.com/geniusrise/geniusrise - core framework

    2. https://github.com/geniusrise/geniusrise-huggingface - hf modules

    3. https://github.com/geniusrise/geniusrise-openai - openai modules

    4. https://github.com/geniusrise/geniusrise-listeners - streaming data input

    5. https://github.com/geniusrise/geniusrise-databases - database input

    6. https://github.com/geniusrise/geniusrise-prompt-actions - functional integrations (RAG-able and GPT function call-able, WIP)

    7. https://github.com/geniusrise/geniusrise-indexing - vectorizing for RAG usecases (WIP)

    8. https://github.com/geniusrise/geniusrise-exit-proxy - cached LLM interface with MITM-auditing (WIP)

    ## Asides

    I think the core framework can be AGPL but the modules must be MIT / Apachev2.

    I really wanted to create an elaborate example in the guides but could not find time, - something like load and vectorize SNOMED-CT or UMLS and use it to NER / RAG EHR docs. Or maybe a usecase of doctor communicating to patient in another language (a major problem in India), with reverse translation verifying translated output using the KG. These kinds of stuff are soon to come. Or discourse segmentation for better chunking for RAG usecases.

    I'm not sure if I should add cyberpunk-ed scientists as banner images. I tried with mathematicians like Voevodsky to Andre Joyal to John Baez, but couldn't. Actual geniuses tend to not be famous, hence SDXL fails I guess.

    I plan to also write this framework in scala. The category-theorizing of neural networks is amazing!!! https://github.com/bgavran/Category_Theory_Machine_Learning. I hope Bartosz Milewski approves.

    I love Alan Turing, but cuz of "The Chemical Basis of Morphogenesis". It introduced me to the wonderful world of complex systems. Hence, his image as banner.

    I'm also working on a cli library called "isomorphic", wraps over argparse and provides cli, api, yaml, json interfaces.

    Yes, gradio integration is also underway.

    Finally, to huggingface.

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python streaming-data related posts

Index

What are some of the best open-source streaming-data projects in Python? This list will help you:

Project Stars
1 river 4,766
2 smart_open 3,091
3 Streamz 1,217
4 bytewax 1,144
5 scikit-multiflow 739
6 tractor 249
7 materialize-tutorials 82
8 makinage 38
9 cinje 31
10 rxsci 13
11 geniusrise-listeners 1

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com