Open-source projects categorized as streaming-data
Language filter: + Python + Go + Java + C + Haskell + R

Top 13 streaming-data Open-Source Projects

  • GitHub repo awesome-bigdata

    A curated list of awesome big data frameworks, ressources and other awesomeness.

  • GitHub repo Benthos

    Declarative stream processing for mundane tasks and data engineering

    Project mention: Go youtube channels | reddit.com/r/golang | 2021-03-29

    I stream and have a few talks about building https://www.benthos.dev on https://www.youtube.com/c/Jeffail

  • GitHub repo miller

    Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON

    Project mention: Consultare un databate XML, JSON, CVS o RDF | reddit.com/r/ItalyInformatica | 2021-03-31
  • GitHub repo Pravega

    Pravega - Streaming as a new software defined storage primitive

  • GitHub repo Streamz

    Real-time stream processing for python

  • GitHub repo go-streams

    A lightweight stream processing library for Go

    Project mention: A flexible and powerful stream processing library for Go | news.ycombinator.com | 2020-12-22
  • GitHub repo kafka-streams-in-action

    Source code for the Kafka Streams in Action Book

    Project mention: Unable to run Kafka Streaming App using IntelliJ in Macbook Pro M1 | reddit.com/r/IntelliJIDEA | 2021-04-17

    I have started Zookeeper and Kafka in separate terminals (Not in IDE Terminals) and I m able to send and receive messages from Kafka console. I have downloaded https://github.com/bbejeck/kafka-streams-in-action repo and am trying to run basic chapters and even tried creating a sample kafka streaming app as well. When I m starting the app, I m getting the following error:

  • GitHub repo machine

    Machine is a workflow/pipeline library for processing data (by whitaker-io)

    Project mention: Show HN: Machine, a go library for stream processing | news.ycombinator.com | 2020-12-21

    - Command for generating new projects (similar to cobra)



  • GitHub repo cinje

    A Pythonic and ultra fast template engine DSL.

    Project mention: Avium - A library trying to make object-oriented programming in C easier. | reddit.com/r/programming | 2021-04-18

    Looks good. I'm not sure that a union is the way to go. When I've thought about creating something like this in the past I've mostly considered it from the angle of starting with a different preprocessor, such as https://github.com/cxxxr/lisp-preprocessor, and using it to generate iterator interfaces and the like. While it may sound absolutely undesirable I've also seen PHP used to great effect in a similar way, treating files like templates with inline PHP. I've wondered about a similar solution using the file encoding support in Python, similar to how https://github.com/marrow/cinje functions.

  • GitHub repo makinage

    Stream Processing Made Easy

    Project mention: Develop Kafka Applications in Python | news.ycombinator.com | 2021-02-20
  • GitHub repo icicle

    Icicle Streaming Query Language (by icicle-lang)

  • GitHub repo hermiter

    Efficient Sequential and Batch Estimation of Univariate and Bivariate Probability Density Functions and Cumulative Distribution Functions along with Quantiles (Univariate) and Spearman's Correlation (Bivariate)

    Project mention: Show HN: R pkg for online estimation of Spearmans correlation for streaming data | news.ycombinator.com | 2021-01-31
  • GitHub repo rxsci

    ReactiveX for data science

    Project mention: RxPy Explained: Map, Filter, and Scan | reddit.com/r/Python | 2021-02-12

    This is more verbose but much more flexible. Typically this allowed me to implement a set of operators for data science: https://github.com/maki-nage/rxsci. I think this would not have been possible with RxPY v1.

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2021-04-18.


What are some of the best open-source streaming-data projects? This list will help you:

Project Stars
1 awesome-bigdata 9,849
2 Benthos 3,001
3 miller 2,710
4 Pravega 1,453
5 Streamz 933
6 go-streams 624
7 kafka-streams-in-action 178
8 machine 79
9 cinje 23
10 makinage 10
11 icicle 9
12 hermiter 8
13 rxsci 0