Top 12 Java Spark Projects
-
Deeplearning4j
Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learning using automatic differentiation.
DL4J
-
Alluxio (formerly Tachyon)
Alluxio, data orchestration for analytics and machine learning in the cloud
-
SonarQube
Static code analysis for 29 languages.. Your projects are multi-language. So is SonarQube analysis. Find Bugs, Vulnerabilities, Security Hotspots, and Code Smells so you can release quality code every time. Get started analyzing your projects today for free.
-
Zeppelin
Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.
Have you tried Apache Zepellin I remember that you can pretty print spark dataframes directly on it with z.show(df)
-
Sidetracking a bit the conversation. What a coincidence that the author (Lemire) is also represented on Today's #1 "Ask HN: What are some cool but obscure data structures you know about?" as he is the main contributor of RoaringBitmap https://github.com/RoaringBitmap/RoaringBitmap and one of the main authors of the data structure.
-
Project mention: Using Elastic Search Cluster with Cassandra Cluster. | reddit.com/r/elasticsearch | 2022-05-02
-
-
The Dremio Sonar query engine can query your data where it exists whether it's AWS Glue, S3, Nessie Catalogs, MySQL, Postgres, RedShift and an ever growing list of sources.
-
Scout APM
Less time debugging, more time building. Scout APM allows you to find and fix performance issues with no hassle. Now with error monitoring and external services monitoring, Scout is a developer's best friend when it comes to application development.
-
-
spark-bigquery-connector
BigQuery data source for Apache Spark: Read data from BigQuery into DataFrames, write DataFrames into BigQuery tables.
Project mention: Completed my first Data Engineering project with Kafka, Spark, GCP, Airflow, dbt, Terraform, Docker and more! | reddit.com/r/dataengineering | 2022-04-02 -
rumble
⛈️ RumbleDB 1.19.0 "Tipuana Tipu" 🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more (by RumbleDB)
Project mention: RumbleDB: Query with ease a lot of different nested, heterogeneous data formats | news.ycombinator.com | 2021-12-01 -
-
Project mention: Show HN: Open-source project for scalable A/B statistical analysis | news.ycombinator.com | 2022-01-12
Java Spark related posts
Index
What are some of the best open-source Spark projects in Java? This list will help you:
Project | Stars | |
---|---|---|
1 | Deeplearning4j | 12,568 |
2 | Alluxio (formerly Tachyon) | 5,816 |
3 | Zeppelin | 5,784 |
4 | RoaringBitmap | 2,724 |
5 | elassandra | 1,656 |
6 | zingg | 565 |
7 | nessie | 500 |
8 | Sparkler | 393 |
9 | spark-bigquery-connector | 233 |
10 | rumble | 175 |
11 | lambda-arch | 139 |
12 | dead-salmon-brain | 11 |
Are you hiring? Post a new remote job listing for free.