-
LakeSoul
LakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
1.Introduction LakeSoul is a streaming batch integrated table storage framework built on The Apache Spark engine. It has highly extensible metadata management, ACID transactions, efficient and flexible UPSERT operations, Schema evolution, and batch integration processing. LakeSoul specifically optimizes the row and column level incremental updates, high concurrent entries, and batch scan reads for data on top of the Data Lake cloud storage. The storage separation architecture of cloud-native computing makes deployment very simple while supporting huge data volumes at a very low cost. LakeSoul supports high-performance write throughput in hashed partition primary key UPsert scenarios through lSM-tree, which can reach 30MB/s/core on object storage systems such as S3. The highly optimized Merge on Reading implementation also ensures Read performance. LakeSoul manages metadata through Cassandra to achieve high scalability of metadata. LakeSoul’s main features are as follows:
The above is the detailed information about LakeSoul, and there is more information on its Github homepage for reference. In the following story, I will introduce the detailed information about Iceberg and make a comparison between them, which is beneficial for me to learn about data lake better. If it is helpful to you, please read it or share it. I also hope you can give me guidance and suggestions for my study. Thank you.