Ingest, store, & analyze all types of time series data in a fully-managed, purpose-built database. Keep data forever with low-cost storage and superior data compression. Learn more →
Top 10 C++ Big Data Projects
-
Project mention: Postgresql on NVME SSD slow performance tuning/monitoring | reddit.com/r/btrfs | 2023-03-27
The problem with clickhouse is that it doesn't have some algorithms implemented when data doesn't fit memory, and just fails with OOM: https://github.com/ClickHouse/ClickHouse/issues/47521
-
NebulaGraph Database
A distributed, fast open-source graph database featuring horizontal scalability and high availability (by vesoft-inc)
A NoSQL graph database is a type of non-relational, distributed database which employs a graph model. NoSQL stands for “Not only SQL” and refers to a new breed of databases that differ from traditional relational databases in their data model and performance. Graph databases are especially useful for data associated with relationships—everything from friendships on social netwo#rks to equipment supply chains or business processes. They can quickly traverse vast amounts of linked data points to discover insights and hidden connections between entities, making them ideal for network analysis– such as financial fraud detection, recommendation engines and many other use cases– all while performing at scale.
-
InfluxDB
Access the most powerful time series database as a service. Ingest, store, & analyze all types of time series data in a fully-managed, purpose-built database. Keep data forever with low-cost storage and superior data compression.
-
Project mention: Tencent Data Engineer: Why We Went from ClickHouse to Apache Doris? | reddit.com/r/dataengineering | 2023-03-10
Really interested in partial updates, but haven't found any information on how physically the merges/upserts happen. It would be great if a doc like https://github.com/apache/kudu/blob/master/docs/design-docs/tablet.md existed for apache doris.
-
Project mention: Yandex open-sources its exabyte-scale big data platform | news.ycombinator.com | 2023-03-22
-
PGM-index
🏅State-of-the-art learned data structure that enables fast lookup, predecessor, range searches and updates in arrays of billions of items using orders of magnitude less space than traditional indexes
-
Project mention: Is there a no-compromise (presumably C/C++) platform similar to Apache Spark? | reddit.com/r/dataengineering | 2022-07-27
-
ukv
Replacing MongoDB, Neo4J, and Elastic with 1 transactional database. Features: zero-copy semantics, swappable backends, bindings for C, C++, Python, Java, GoLang
Project mention: Up to 100x Faster FastAPI with simdjson and io_uring on Linux 5.19+ | reddit.com/r/programming | 2023-03-06Just to clarify, I meant in other projects, like the UKV.
-
Sonar
Write Clean C++ Code. Always.. Sonar helps you commit clean C++ code every time. With over 550 unique rules to find C++ bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work.
-
Project mention: Show HN: Turn any data into a fast analytical API | news.ycombinator.com | 2022-04-10
we use our in-house baked engine - open sourced here https://github.com/varchar-io/nebula
Yeah, Tinybird has lots of similarities, I will do more research on it, thanks for the reference.
-
Project mention: Show HN: GraphAr – Open-source file format for archiving/exchanging graph data | news.ycombinator.com | 2023-03-06
-
Hey, I've released version 0.8.0 of Reduct CLI, the Python package for managing data stored in ReductStore. This release includes two new features that will be particularly helpful for our public datasets hosted on ReductStore, where metadata can be used to provide important context for the data.
C++ Big Data related posts
- YTsaurus: Open-source big data platform for distributed storage and processing
- YTsaurus – Yandex open source big data platform
- Tencent Data Engineer: Why We Went from ClickHouse to Apache Doris?
- Show HN: A Tool for Data Obfuscation
- Q – Run SQL Directly on CSV or TSV Files
- Show HN: Turn any data into a fast analytical API
- Show HN: Visualize your streaming data in real-time
-
A note from our sponsor - InfluxDB
www.influxdata.com | 29 Mar 2023
Index
What are some of the best open-source Big Data projects in C++? This list will help you:
Project | Stars | |
---|---|---|
1 | ClickHouse | 27,646 |
2 | NebulaGraph Database | 8,891 |
3 | kudu | 1,698 |
4 | ytsaurus | 1,273 |
5 | PGM-index | 674 |
6 | oneDAL | 549 |
7 | ukv | 257 |
8 | nebula | 132 |
9 | GraphAr | 54 |
10 | ReductStore | 35 |