Top 23 Bigdata Open-Source Projects

TDengine

33 22,764 10.0 C

TDengine is an open source, high-performance, cloud native time-series database optimized for Internet of Things (IoT), Connected Cars, Industrial IoT and DevOps.

Project mention: TDengine: NEW Data - star count:22190.0 | /r/algoprojects | 2023-11-14
shardingsphere

1 19,406 10.0 Java

Distributed SQL transaction & query engine for data sharding, scaling, encryption, and more - on any database.

Project mention: Managing Data Residency - the demo | dev.to | 2023-05-25

Opposite to what the documentation tells, the full prefix is jdbc:shardingsphere:absolutepath. I've opened a PR to fix the documentation.
InfluxDB

www.influxdata.com
sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
awesome-bigdata

1 12,773 1.5

A curated list of awesome big data frameworks, ressources and other awesomeness.

Project mention: Good coding groups for black women? | news.ycombinator.com | 2024-01-13
juicefs

8 9,774 9.7 Go

JuiceFS is a distributed POSIX file system built on top of Redis and S3.

Project mention: South Korea's No.1 Search Engine Chose JuiceFS over Alluxio for AI Storage | dev.to | 2024-01-18

Support for Kerberos keytab files
vaex

0 8,171 6.0 Python

Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀
hudi

3 5,038 9.9 Java

Upserts, Deletes And Incremental Processing on Big Data.

Project mention: Getting Started with Flink SQL, Apache Iceberg and DynamoDB Catalog | dev.to | 2023-12-18

Apache Iceberg is one of the three types of lakehouse, the other two are Apache Hudi and Delta Lake.
OpenMetadata

6 4,039 10.0 TypeScript

Open Standard for Metadata. A Single place to Discover, Collaborate and Get your data right.

Project mention: How to Dynamically Adjust the Height of a Textarea in ReactJS | dev.to | 2023-10-25

In this blog post, I have demonstrated how I addressed the challenge of dynamically adjusting the height of a textarea element based on its content, preventing the need for vertical scrolling in the title section of the OpenMetadata Knowledge article page.
WorkOS

workos.com
sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
volcano

0 3,744 9.1 Go

A Cloud Native Batch System (Project under CNCF)
Apache Avro

4 2,753 9.7 Java

Apache Avro is a data serialization system.

Project mention: Open Table Formats Such as Apache Iceberg Are Inevitable for Analytical Data | news.ycombinator.com | 2024-01-18

Apache AVRO [1] is one but it has been largely replaced by Parquet [2] which is a hybrid row/columnar format
[1] https://avro.apache.org/
dpark

0 2,691 0.0 Python

Python clone of Spark, a MapReduce alike framework in Python
griddb

8 2,305 7.8 C++

GridDB is a next-generation open source database that makes time series IoT and big data fast,and easy.

Project mention: griddb: NEW Data - star count:2133.0 | /r/algoprojects | 2023-07-31
spark

0 1,995 0.0 C#

.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers. (by dotnet)
Optimus

0 1,439 1.9 Python

:truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark (by ironmussa)
tensorbase

0 1,423 0.0 Rust

TensorBase is a new big data warehousing with modern efforts.
odd-platform

9 1,104 8.8 Java

First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.

Project mention: OpenDataDiscovery 0.15 with Data Deprecation and Metadata Stale | news.ycombinator.com | 2023-08-04
cds

0 953 0.0 Go

Data syncing in golang for ClickHouse. (by zeromicro)
Mobius: C# API for Spark

0 937 4.2 C#

C# and F# language binding and extensions to Apache Spark (by microsoft)
tispark

0 878 5.0 Scala

TiSpark is built for running Apache Spark on top of TiDB/TiKV
incubator-livy

0 851 5.8 Scala

Apache Livy is an open source REST interface for interacting with Apache Spark from anywhere.
visualpython

0 801 9.0 JavaScript

GUI-based Python code generator for data science, extension to Jupyter Lab, Jupyter Notebook and Google Colab.
Gearpump

0 765 0.0 Scala

Lightweight real-time big data streaming engine over Akka
WeDataSphere

0 632 5.0

WeDataSphere is a financial grade, one-stop big data platform suite.
spline

0 578 7.1 Scala

Data Lineage Tracking And Visualization Solution (by AbsaOSS)
SaaSHub

www.saashub.com
sponsored

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2024-01-18.

Bigdata related posts

TDengine: NEW Data - star count:22190.0
1 project | /r/algoprojects | 14 Nov 2023
TDengine: NEW Data - star count:22190.0
1 project | /r/algoprojects | 10 Nov 2023
TDengine: NEW Data - star count:22190.0
1 project | /r/algoprojects | 9 Nov 2023
TDengine: NEW Data - star count:21816.0
1 project | /r/algoprojects | 28 Oct 2023
TDengine: NEW Data - star count:21816.0
1 project | /r/algoprojects | 26 Oct 2023
How to Dynamically Adjust the Height of a Textarea in ReactJS
1 project | dev.to | 25 Oct 2023
TDengine: NEW Data - star count:21816.0
1 project | /r/algoprojects | 24 Oct 2023
A note from our sponsor - SaaSHub
www.saashub.com | 16 Apr 2024

SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source Bigdata projects? This list will help you:

	Project	Stars
1	TDengine	22,764
2	shardingsphere	19,406
3	awesome-bigdata	12,773
4	juicefs	9,774
5	vaex	8,171
6	hudi	5,038
7	OpenMetadata	4,039
8	volcano	3,744
9	Apache Avro	2,753
10	dpark	2,691
11	griddb	2,305
12	spark	1,995
13	Optimus	1,439
14	tensorbase	1,423
15	odd-platform	1,104
16	cds	953
17	Mobius: C# API for Spark	937
18	tispark	878
19	incubator-livy	851
20	visualpython	801
21	Gearpump	765
22	WeDataSphere	632
23	spline	578