Java Data

Open-source Java projects categorized as Data

Top 17 Java Data Projects

  1. Presto

    The official home of the Presto distributed SQL query engine for big data

    Project mention: Using IRIS and Presto for high-performance and scalable SQL queries | dev.to | 2025-01-19

    The rise of Big Data projects, real-time self-service analytics, online query services, and social networks, among others, have enabled scenarios for massive and high-performance data queries. In response to this challenge, MPP (massively parallel processing database) technology was created, and it quickly established itself. Among the open-source MPP options, Presto (https://prestodb.io/) is the best-known option. It originated in Facebook and was utilized for data analytics, but later became open-sourced. However, since Teradata has joined the Presto community, it offers support now.

  2. Nutrient

    Nutrient - The #1 PDF SDK Library. Bad PDFs = bad UX. Slow load times, broken annotations, clunky UX frustrates users. Nutrient’s PDF SDKs gives seamless document experiences, fast rendering, annotations, real-time collaboration, 100+ features. Used by 10K+ devs, serving ~half a billion users worldwide. Explore the SDK for free.

    Nutrient logo
  3. kestra

    :zap: Workflow Automation Platform. Orchestrate & Schedule code in any language, run anywhere, 500+ plugins. Alternative to Zapier, Rundeck, Camunda, Airflow...

    Project mention: Study Notes 2.2.7: Managing Schedules and Backfills with BigQuery in Kestra | dev.to | 2025-02-04

    Kestra Documentation: Kestra.io

  4. pkl

    A configuration as code language with rich validation and tooling.

    Project mention: JSON5 – JSON for Humans | news.ycombinator.com | 2024-12-08

    When I manage a project and have the freedom to choose my configuration structure, then I always use typescript. I never understood the desire to have configuration be in ini/json/jsonnet/yaml. A strongly typed configuration with code completion seems so much more robust. Except of course your usecase is to load or change the config via an API.

    I like what apple is doing with https://pkl-lang.org/ though.

  5. data-transfer-project

    The Data Transfer Project makes it easy for platforms to build interoperable user data portability features. We are establishing a common framework, including data models and protocols, to enable direct transfer of data both into and out of participating online service providers.

  6. proteus

    Proteus : A JSON based LayoutInflater for Android

  7. nessie

    Nessie: Transactional Catalog for Data Lakes with Git-like semantics

    Project mention: Polaris Catalog: An Open Source Catalog for Apache Iceberg | news.ycombinator.com | 2024-06-03
  8. jimmer

    A revolutionary ORM framework for both java and kotlin.

  9. CodeRabbit

    CodeRabbit: AI Code Reviews for Developers. Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.

    CodeRabbit logo
  10. micronaut-data

    Ahead of Time Data Repositories

  11. riot

    🧨 Get data in & out of Redis with RIOT (by redis)

  12. rapiddweller-benerator-ce

    BENERATOR is a leading software solution to generate, obfuscate, pseudonymize and migrate data for development, testing, and training purposes with a model-driven approach.

  13. pgCompare

    pgCompare – a straightforward utility crafted to simplify the data comparison process, providing a robust solution for comparing data across various database platforms.

    Project mention: Show HN: PgCompare – Data comparison made simple | news.ycombinator.com | 2024-06-02
  14. ModelRunner

    No-code, model driven, natural language data access platform

  15. nextcloud-tables

    📊 Android client for nextcloud tables app

  16. Db4o-gpl

    new Db4o GPL Source Code for Java7+ & .netstardard2.0 Android Xamarin..., the best database project to help you to learn how to make databases

  17. SheetsIO

    Small configurable Java app that pulls data from a Google Spreadsheet (using v4 api) and writes to files and a local webserver.

  18. Data-Structures-and-Algorithms

    Solutions to Arrays, Strings, Lists, Sorting, Stacks, Trees and General DS problems using JAVA. (by anishkumar127)

  19. SparkDB

    CSV-to-database-structure project (by NaDeSys)

  20. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Java Data discussion

Log in or Post with

Java Data related posts

  • Polaris Catalog: An Open Source Catalog for Apache Iceberg

    1 project | news.ycombinator.com | 3 Jun 2024
  • Show HN: PgCompare – Data comparison made simple

    1 project | news.ycombinator.com | 2 Jun 2024
  • A deep dive into the concept and world of Apache Iceberg Catalogs

    1 project | dev.to | 1 Mar 2024
  • Apple releases Pkl – onfiguration as code language

    14 projects | news.ycombinator.com | 3 Feb 2024
  • Multi-Database Support in DuckDB

    3 projects | news.ycombinator.com | 28 Jan 2024
  • Why is Hive Metastore everywhere? (Especially Iceberg)

    1 project | /r/dataengineering | 30 Jun 2023
  • Missouri trans 'snitch form' down after people spammed it with the 'Bee Movie' script

    4 projects | /r/politics | 22 Apr 2023
  • A note from our sponsor - SaaSHub
    www.saashub.com | 19 Feb 2025
    SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source Data projects in Java? This list will help you:

# Project Stars
1 Presto 16,196
2 kestra 15,838
3 pkl 10,487
4 data-transfer-project 3,576
5 proteus 1,306
6 nessie 1,129
7 jimmer 1,087
8 micronaut-data 471
9 riot 295
10 rapiddweller-benerator-ce 147
11 pgCompare 122
12 ModelRunner 56
13 nextcloud-tables 40
14 Db4o-gpl 30
15 SheetsIO 23
16 Data-Structures-and-Algorithms 12
17 SparkDB 3

Sponsored
Nutrient - The #1 PDF SDK Library
Bad PDFs = bad UX. Slow load times, broken annotations, clunky UX frustrates users. Nutrient’s PDF SDKs gives seamless document experiences, fast rendering, annotations, real-time collaboration, 100+ features. Used by 10K+ devs, serving ~half a billion users worldwide. Explore the SDK for free.
nutrient.io

Did you know that Java is
the 8th most popular programming language
based on number of references?