4 best opensource projects about big data you should try out

This page summarizes the projects mentioned and recommended in the original post on /r/learnprogramming

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • iceberg

    Apache Iceberg

  • 2.Iceberg Iceberg is an open table format for huge analytic dataset with Schema evolution, Hidden partitioning, Partition layout evolution, Time trave, Version rollback, etc.

  • LakeSoul

    LakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.

  • 3.Lakesoul LakeSoul is a unified streaming and batch table storage solution built on top of the Apache Spark engine, and supports scalable metadata management, ACID transactions, efficient and flexible upsert operation, schema evolution, and streaming & batch unification.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • delta

    An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs (by delta-io)

  • 4.delta lake Delta Lake is an open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs for Scala, Java, Rust, Ruby, and Python, providing ACID transactions, scalable metadata handling, and unifies streaming and batch data processing on top of existing data lakes, such as S3, ADLS, GCS, and HDFS.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Understanding Parquet, Iceberg and Data Lakehouses

    4 projects | news.ycombinator.com | 29 Dec 2023
  • Getting Started with Flink SQL, Apache Iceberg and DynamoDB Catalog

    4 projects | dev.to | 18 Dec 2023
  • [D] Is there other better data format for LLM to generate structured data?

    1 project | /r/MachineLearning | 10 Dec 2023
  • Delta vs Iceberg: make love not war

    1 project | /r/MicrosoftFabric | 30 Jun 2023
  • Databricks Strikes $1.3B Deal for Generative AI Startup MosaicML

    4 projects | news.ycombinator.com | 26 Jun 2023