SaaSHub helps you find the best software and product alternatives Learn more →
Top 9 Java ETL Projects
-
kestra
Infinitely scalable, event-driven, language-agnostic orchestration and scheduling platform to manage millions of workflows declaratively in code.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
Smooks
Extensible data integration Java framework for building XML and non-XML fragment-based applications
-
ReplicaDB
ReplicaDB is open source tool for database replication, designed for efficiently transferring bulk data between relational and non-relational databases
-
kafka-connect-file-pulse
🔗 A multipurpose Kafka Connect connector that makes it easy to parse, transform and stream any file, in any format, into Apache Kafka
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
contube
ConTube: A scalable data connector framework that facilitates efficient data transfer between diverse systems.
Project mention: Variant in Apache Doris 2.1.0: a new data type 8 times faster than JSON for semi-structured data analysis | dev.to | 2024-03-27As an open-source real-time data warehouse, Apache Doris provides semi-structured data processing capabilities, and the newly-released version 2.1.0 makes a stride in this direction. Before V2.1, Apache Doris stores semi-structured data as JSON files. However, during query execution, the real-time parsing of JSON data leads to high CPU and I/O consumption in addition to high query latency, especially when the dataset is huge and complicated. Moreover, the lack of a pre-defined schema means there is no handle for query optimization.
Kestra's communication is asynchronous and based on a queuing mechanism. It leverages the Micronaut framework and offers two runners: one that uses a database (JDBC) for both the message queue and resource storage, and another that uses Kafka as the message queue and Elasticsearch as the resource storage. The platform is fully extensible and plugin-based, providing a rich set of plugins for various workflow tasks, triggers, and data storage options. For those interested, the GitHub repository is available here: https://github.com/kestra-io/kestra
Project mention: Kafka Connect Filepulse 2.13.0 is now available! This version includes support for SFTP and Alibaba OSS. It also contains many bug fixes and improvements. 🚀 | /r/apachekafka | 2023-09-15
Project mention: Show HN: ConTube – A Scalable Data Connect Framework for Pulsar/Kafka Ecosystems | news.ycombinator.com | 2023-12-04
Java ETL related posts
- Kafka Connect Filepulse 2.13.0 is now available! This version includes support for SFTP and Alibaba OSS. It also contains many bug fixes and improvements. 🚀
- Best ‘E’TL tools for extracting data from on-prem SQL databases
- Maven unable to resolve a dependency given in pom.xml. I've instead tried manually downloading installing the jar, but now maven cannot find the package.
- Download json and csv file from github repository with apache kafka
- Streaming data into Kafka S01/E04 — Loading Log files using Grok Expression
-
A note from our sponsor - SaaSHub
www.saashub.com | 23 Apr 2024
Index
What are some of the best open-source ETL projects in Java? This list will help you:
Project | Stars | |
---|---|---|
1 | doris | 11,314 |
2 | kestra | 6,260 |
3 | zingg | 877 |
4 | Smooks | 383 |
5 | ReplicaDB | 357 |
6 | kafka-connect-file-pulse | 305 |
7 | neo4j-jdbc | 123 |
8 | contube | 10 |
9 | dcc-import | 1 |
Sponsored