Streaming data storage

This page summarizes the projects mentioned and recommended in the original post on /r/dataengineering

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • dozer

    Dozer is a real-time data movement tool that leverages CDC from various sources and moves data into various sinks. (by getdozer)

  • Storing it in the data lake is entirely okay and the best approach. The question is more what is the reason for serving this data to the user? Do you need to serve raw data or need to be aggregated? what is the latency required? In any case, the best approach is to move it to an alternative storage best suited for real-time querying. Some people use ElasticSearch, some other people use key-value stores depending on the use cases. I have gone through this problem so many times that I decided to start a project solving this problem end to end: moving data from multiple storage layers, applying transformations, caching it, and creating gRPC REST APIs. This is in general the pattern that everyone follows. If you wish you can take a look at it here: https://github.com/getdozer/dozer. Happy to help if needed! Just join out Discord community!

  • tsbs

    Time Series Benchmark Suite, a tool for comparing and evaluating databases for time series data

  • According their benchmark it is really fast.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts