LakeSoul vs starrocks

LakeSoul

LakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications. (by lakesoul-io)

Source Code

lakesoul-io.github.io

Suggest alternative

Edit details

starrocks

StarRocks, a Linux Foundation project, is a next-generation sub-second MPP OLAP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics, and ad-hoc queries. InfoWorld’s 2023 BOSSIE Award for best open source software. (by StarRocks)

Source Code

starrocks.io

Suggest alternative

Edit details

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

LakeSoul		starrocks
	Project
21	Mentions	12
2,307	Stars	7,789
0.8%	Growth	2.2%
9.2	Activity	10.0
7 days ago	Latest Commit	about 22 hours ago
Java	Language	Java
Apache License 2.0	License	Apache License 2.0

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

LakeSoul

Posts with mentions or reviews of LakeSoul. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2022-12-28.

Open Source first Anniversary Star 1.2K! Review on the anniversary of LakeSoul, the unique open-source Lakehouse
2 projects | dev.to | 28 Dec 2022

Review code reference: https://github.com/meta-soul/LakeSoul/pull/115
The best Open-source lakehouse project, LakeSoul 2.0, supports snapshot, rollback, Flink, and Hive interconnection
1 project | dev.to | 8 Jul 2022

In LakeSoul 2.0, metadata and database interaction are fully implemented using the Postgres SQL (PG) protocol for reasons mentioned at https://github.com/meta-soul/LakeSoul/issues/23. On the one hand, Cassandra does not support single-table multi-partition transactions. On the other hand, Cassandra cluster management has higher maintenance costs, while the Postgres SQL protocol is widely used in enterprises and has lower maintenance costs. You need to configure PG parameters. For details, click https://github.com/meta-soul/LakeSoul/wiki/02.-QuickStart
A New One-stop AI development and production platform, AlphaIDE
2 projects | dev.to | 15 Jun 2022

I’ve posted about LakeSoul, an open-source framework for unified streaming and batch table storage, and MetaSpore, an open-source platform for machine learning.
Build a real-time machine learning sample library using the best open-source project about big data and data lakehouse, LakeSoul
1 project | /r/datascience | 9 Jun 2022

2.4 Data Backfill Since LakeSoul supports Upsert of any Range partitioned data, there is no difference between backtracking and streaming write. When the data to be inserted is ready, Spark performs Upsert to update historical data. LakeSoul automatically recognizes Schema changes. Update meta information of tables to implement Schema evolution. LakeSoul provides a complete storage function of data warehouse tables, and each historical partition can be queried and updated. Compared with Flink’s window Join scheme, it solves the problem of invisible intermediate states and can quickly realize mass updates and traceability of historical data.

1 project | dev.to | 6 May 2022

The previous article, "The design concept of the best open-source project about big data and data lakehouse" introduced the design concept and partial realization principle of LakeSoul's open-source and stream batch integrated surface storage framework. The original intention of the design of LakeSoul is to solve various problems that are difficult to solve in traditional Hive data warehouse scenarios, including Upsert update, Merge on Read, and concurrent write. This article will demonstrate the core capabilities of LakeSoul using a typical application scenario: building a real-time machine learning sample library.
Solved a practical business problem when using Hudi: LakeSoul supports null field non-override semanticssemantics
1 project | dev.to | 29 May 2022

Recently, the LakeSoul r&d team helped users solve a practical business problem using Hudi. Here is a summary and record. The business process is that the upstream system extracts the original data from the online DB table into JSON format and writes it into Kafka. The downstream system uses Spark to read the messages in Kafka. The data is updated and aggregated using Hudi and sent to the downstream database for analysis.
What is the Lakehouse, the latest Direction of Big Data Architecture?
2 projects | dev.to | 14 May 2022

Lakesoul
Design concept of a best opensource project about big data and data lakehouse
1 project | dev.to | 16 Apr 2022

LakeSoul is a streaming batch integrated table storage framework developed by DMetaSoul, which has made a lot of design optimization around the new trend of big data architecture systems. This paper explains the core concept and design principle of LakeSoul, the Open-source Project, in detail.
Data engine engineers interview for help
1 project | /r/learnprogramming | 9 Apr 2022

Maybe you can use some of this code with a dataset over the next two days and compare the products to show the interviewer that you know a lot about the projects. Interviewers like candidates who can easily tell the difference between different products. Perhaps take a look at Lakesoul, similar to Iceberg, Hudi, etc., whose GitHub has a comparison of open-source data lake projectsand how to use them. You can also check out Iceberg, Hudi's website, which has detailed tutorials.
Details of 4 best opensource projects about big data you should try out（Ⅰ）
2 projects | dev.to | 7 Apr 2022

1.Introduction LakeSoul is a streaming batch integrated table storage framework built on The Apache Spark engine. It has highly extensible metadata management, ACID transactions, efficient and flexible UPSERT operations, Schema evolution, and batch integration processing. LakeSoul specifically optimizes the row and column level incremental updates, high concurrent entries, and batch scan reads for data on top of the Data Lake cloud storage. The storage separation architecture of cloud-native computing makes deployment very simple while supporting huge data volumes at a very low cost. LakeSoul supports high-performance write throughput in hashed partition primary key UPsert scenarios through lSM-tree, which can reach 30MB/s/core on object storage systems such as S3. The highly optimized Merge on Reading implementation also ensures Read performance. LakeSoul manages metadata through Cassandra to achieve high scalability of metadata. LakeSoul’s main features are as follows:

starrocks

Posts with mentions or reviews of starrocks. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-04-09.

A MySQL compatible database engine written in pure Go
10 projects | news.ycombinator.com | 9 Apr 2024

tidb has been around for a while, it is distributed, written in Go and Rust, and MySQL compatible. https://github.com/pingcap/tidb
Somewhat relatedly, StarRocks is also MySQL compatible, written in Java and C++, but it's tackling OLAP use-cases. https://github.com/StarRocks/starrocks
StarRocks – sub-second MPP OLAP database for full analytics scenarios
1 project | news.ycombinator.com | 23 Jan 2024
Let's Talk about Joins
2 projects | news.ycombinator.com | 20 Jan 2024

I think you're talking about doing denormalization before importing data into an OLAP system to avoid subsequent joins. However, this greatly limits the flexibility of data modeling. Moreover, denormalization can be a headache-inducing process. In fact, I have tested StarRocks (https://github.com/StarRocks/starrocks), and it is capable of performing joins while streaming data imports, and the speed is very fast. It's worth giving it a try.
Ask HN: Are there any notable Chinese FLOSS projects?
4 projects | news.ycombinator.com | 11 May 2023

https://github.com/apache/doris Is a great example. Same for it's cousin https://github.com/StarRocks/starrocks that was an early fork of the doris project.
To be fair, these are the only examples I can think of and I only learned of these as I'm standing up new data infra using starrocks.
Open Source Columnar Databases
2 projects | /r/dataengineering | 17 Mar 2023

ClickHouseClickHouse and Starrocks are similar. They are both columnar databases powered by vectorization tech, which means they are really fast.
Ask HN: Do you use any software (mainly) developed in China?
3 projects | news.ycombinator.com | 27 Feb 2023

StarRocks, it’s a Linux Foundation project now, but a lot of the initial team and community behind it came from China.
https://github.com/StarRocks/starrocks
Funny that I hadn’t heard of them in the database space till they showed up at the top of ClickBench. Makes me wonder what other interesting projects I’m missing out on in China.
Anyone using StarRocks DB instead of ClickHouse?
1 project | /r/dataengineering | 17 Nov 2022
Show HN: A benchmark for analytical databases (Snowflake, Druid, Redshift)
11 projects | news.ycombinator.com | 13 Jul 2022

Full disclosure - I work for StarRocks (starrocks.com)
First of all, this is great. Transparent and healthy competition is always great for the customers!
Regarding the joined table queries that are missing in the tests, this is exactly why we built StarRocks - to give people the best performance of complex analytics queries on both joined tables and single tables.
I encourage you to checkout this blog: https://starrocks.medium.com/starrocks-outperforms-clickhous...
And, give us a star if you think we are doing the right thing: https://github.com/StarRocks/starrocks
Follow us on LinkedIn for the latest updates: https://www.linkedin.com/company/starrocks
We are looking for a very fast database for big data analysis, does anyone know about starrocks, I heard it is very fast
1 project | /r/programming | 22 Dec 2021
wow, i found a super fast database for Big Data analytics,it's called StarRocks,come and take a look!
1 project | /r/programming | 22 Dec 2021

What are some alternatives?

When comparing LakeSoul and starrocks you can also consider the following projects:

MetaSpore - A unified end-to-end machine intelligence platform

ClickBench - ClickBench: a Benchmark For Analytical Databases

iceberg - Apache Iceberg

doris - Apache Doris is an easy-to-use, high performance and unified analytics database.

delta - An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs

duckdb - DuckDB is an in-process SQL OLAP Database Management System

hudi - Upserts, Deletes And Incremental Processing on Big Data.

TablePlus - TablePlus macOS issue tracker

delta-sharing - An open protocol for secure data sharing

clickhouse-bulk - Collects many small inserts to ClickHouse and send in big inserts

nussknacker - Low-code tool for automating actions on real time data | Stream processing for the users.

ClickHouse - ClickHouse® is a free analytics DBMS for big data

LakeSoul vs MetaSpore starrocks vs ClickBench LakeSoul vs iceberg starrocks vs doris LakeSoul vs delta starrocks vs duckdb LakeSoul vs hudi starrocks vs TablePlus LakeSoul vs delta-sharing starrocks vs clickhouse-bulk LakeSoul vs nussknacker starrocks vs ClickHouse

Compare LakeSoul vs starrocks and see what are their differences.

LakeSoul

starrocks

LakeSoul

starrocks

What are some alternatives?