Details of 4 best opensource projects about big data you should try out(Ⅰ)

This page summarizes the projects mentioned and recommended in the original post on dev.to

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • LakeSoul

    LakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.

  • 1.Introduction LakeSoul is a streaming batch integrated table storage framework built on The Apache Spark engine. It has highly extensible metadata management, ACID transactions, efficient and flexible UPSERT operations, Schema evolution, and batch integration processing. LakeSoul specifically optimizes the row and column level incremental updates, high concurrent entries, and batch scan reads for data on top of the Data Lake cloud storage. The storage separation architecture of cloud-native computing makes deployment very simple while supporting huge data volumes at a very low cost. LakeSoul supports high-performance write throughput in hashed partition primary key UPsert scenarios through lSM-tree, which can reach 30MB/s/core on object storage systems such as S3. The highly optimized Merge on Reading implementation also ensures Read performance. LakeSoul manages metadata through Cassandra to achieve high scalability of metadata. LakeSoul’s main features are as follows:

  • iceberg

    Apache Iceberg

  • The above is the detailed information about LakeSoul, and there is more information on its Github homepage for reference. In the following story, I will introduce the detailed information about Iceberg and make a comparison between them, which is beneficial for me to learn about data lake better. If it is helpful to you, please read it or share it. I also hope you can give me guidance and suggestions for my study. Thank you.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Iceberg won the table format war: But not in the way you thought it might

    2 projects | /r/dataengineering | 6 Jul 2023
  • Lakehouse using AWS Athena on Iceberg Concerns

    1 project | /r/dataengineering | 28 May 2023
  • apache/iceberg: Apache Iceberg

    1 project | /r/devopsish | 13 Feb 2023
  • What are the main things I need to know to be hired as a Java developer?

    4 projects | /r/java | 1 Feb 2023
  • Have you used Athena Iceberg for small(-ish) data?

    1 project | /r/aws | 28 Jan 2023