Bigdata

Open-source projects categorized as Bigdata

Top 23 Bigdata Open-Source Projects

  • TDengine

    TDengine is an open source, high-performance, cloud native time-series database optimized for Internet of Things (IoT), Connected Cars, Industrial IoT and DevOps.

    Project mention: TDengine: NEW Data - star count:22190.0 | /r/algoprojects | 2023-11-14
  • shardingsphere

    Distributed SQL transaction & query engine for data sharding, scaling, encryption, and more - on any database.

    Project mention: Managing Data Residency - the demo | dev.to | 2023-05-25

    Opposite to what the documentation tells, the full prefix is jdbc:shardingsphere:absolutepath. I've opened a PR to fix the documentation.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

  • awesome-bigdata

    A curated list of awesome big data frameworks, ressources and other awesomeness.

    Project mention: Good coding groups for black women? | news.ycombinator.com | 2024-01-13
  • juicefs

    JuiceFS is a distributed POSIX file system built on top of Redis and S3.

    Project mention: South Korea's No.1 Search Engine Chose JuiceFS over Alluxio for AI Storage | dev.to | 2024-01-18

    Support for Kerberos keytab files

  • vaex

    Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀

  • hudi

    Upserts, Deletes And Incremental Processing on Big Data.

    Project mention: Getting Started with Flink SQL, Apache Iceberg and DynamoDB Catalog | dev.to | 2023-12-18

    Apache Iceberg is one of the three types of lakehouse, the other two are Apache Hudi and Delta Lake.

  • OpenMetadata

    Open Standard for Metadata. A Single place to Discover, Collaborate and Get your data right.

    Project mention: How to Dynamically Adjust the Height of a Textarea in ReactJS | dev.to | 2023-10-25

    In this blog post, I have demonstrated how I addressed the challenge of dynamically adjusting the height of a textarea element based on its content, preventing the need for vertical scrolling in the title section of the OpenMetadata Knowledge article page.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

  • volcano

    A Cloud Native Batch System (Project under CNCF)

  • Apache Avro

    Apache Avro is a data serialization system.

    Project mention: Open Table Formats Such as Apache Iceberg Are Inevitable for Analytical Data | news.ycombinator.com | 2024-01-18

    Apache AVRO [1] is one but it has been largely replaced by Parquet [2] which is a hybrid row/columnar format

    [1] https://avro.apache.org/

  • dpark

    Python clone of Spark, a MapReduce alike framework in Python

  • griddb

    GridDB is a next-generation open source database that makes time series IoT and big data fast,and easy.

    Project mention: griddb: NEW Data - star count:2133.0 | /r/algoprojects | 2023-07-31
  • spark

    .NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers. (by dotnet)

  • Optimus

    :truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark (by ironmussa)

  • tensorbase

    TensorBase is a new big data warehousing with modern efforts.

  • odd-platform

    First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.

    Project mention: OpenDataDiscovery 0.15 with Data Deprecation and Metadata Stale | news.ycombinator.com | 2023-08-04
  • cds

    Data syncing in golang for ClickHouse. (by zeromicro)

  • Mobius: C# API for Spark

    C# and F# language binding and extensions to Apache Spark (by microsoft)

  • tispark

    TiSpark is built for running Apache Spark on top of TiDB/TiKV

  • incubator-livy

    Apache Livy is an open source REST interface for interacting with Apache Spark from anywhere.

  • visualpython

    GUI-based Python code generator for data science, extension to Jupyter Lab, Jupyter Notebook and Google Colab.

  • Gearpump

    Lightweight real-time big data streaming engine over Akka

  • WeDataSphere

    WeDataSphere is a financial grade, one-stop big data platform suite.

  • spline

    Data Lineage Tracking And Visualization Solution (by AbsaOSS)

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2024-01-18.

Bigdata related posts

Index

What are some of the best open-source Bigdata projects? This list will help you:

Project Stars
1 TDengine 22,764
2 shardingsphere 19,406
3 awesome-bigdata 12,773
4 juicefs 9,774
5 vaex 8,171
6 hudi 5,038
7 OpenMetadata 4,039
8 volcano 3,744
9 Apache Avro 2,753
10 dpark 2,691
11 griddb 2,305
12 spark 1,995
13 Optimus 1,439
14 tensorbase 1,423
15 odd-platform 1,104
16 cds 953
17 Mobius: C# API for Spark 937
18 tispark 878
19 incubator-livy 851
20 visualpython 801
21 Gearpump 765
22 WeDataSphere 632
23 spline 578
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com