Java Spark

Open-source Java projects categorized as Spark Edit details

Top 12 Java Spark Projects

  • Deeplearning4j

    Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learning using automatic differentiation.

    Project mention: Data Science Competition | dev.to | 2022-03-25

    DL4J

  • Alluxio (formerly Tachyon)

    Alluxio, data orchestration for analytics and machine learning in the cloud

  • SonarQube

    Static code analysis for 29 languages.. Your projects are multi-language. So is SonarQube analysis. Find Bugs, Vulnerabilities, Security Hotspots, and Code Smells so you can release quality code every time. Get started analyzing your projects today for free.

  • Zeppelin

    Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.

    Project mention: Visualization using Pyspark Dataframe | reddit.com/r/dataengineering | 2022-05-14

    Have you tried Apache Zepellin I remember that you can pretty print spark dataframes directly on it with z.show(df)

  • RoaringBitmap

    A better compressed bitset in Java

    Project mention: Negative Incentives in Academic Research | news.ycombinator.com | 2022-07-22

    Sidetracking a bit the conversation. What a coincidence that the author (Lemire) is also represented on Today's #1 "Ask HN: What are some cool but obscure data structures you know about?" as he is the main contributor of RoaringBitmap https://github.com/RoaringBitmap/RoaringBitmap and one of the main authors of the data structure.

  • elassandra

    Elassandra = Elasticsearch + Apache Cassandra

    Project mention: Using Elastic Search Cluster with Cassandra Cluster. | reddit.com/r/elasticsearch | 2022-05-02
  • zingg

    Scalable entity resolution, data mastering and deduplication using ML

    Project mention: Merging datasets | reddit.com/r/dataengineering | 2022-07-11
  • nessie

    Nessie: Transactional Catalog for Data Lakes with Git-like semantics

    Project mention: 5 Reasons Your Data Lakehouse should Embrace Dremio Cloud | dev.to | 2022-08-09

    The Dremio Sonar query engine can query your data where it exists whether it's AWS Glue, S3, Nessie Catalogs, MySQL, Postgres, RedShift and an ever growing list of sources.

  • Scout APM

    Less time debugging, more time building. Scout APM allows you to find and fix performance issues with no hassle. Now with error monitoring and external services monitoring, Scout is a developer's best friend when it comes to application development.

  • Sparkler

    Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.

  • spark-bigquery-connector

    BigQuery data source for Apache Spark: Read data from BigQuery into DataFrames, write DataFrames into BigQuery tables.

    Project mention: Completed my first Data Engineering project with Kafka, Spark, GCP, Airflow, dbt, Terraform, Docker and more! | reddit.com/r/dataengineering | 2022-04-02
  • rumble

    ⛈️ RumbleDB 1.19.0 "Tipuana Tipu" 🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more (by RumbleDB)

    Project mention: RumbleDB: Query with ease a lot of different nested, heterogeneous data formats | news.ycombinator.com | 2021-12-01
  • lambda-arch

    A full big data pipeline (Lambda Architecture) with Spark, Kafka, HDFS and Cassandra.

  • dead-salmon-brain

    Apache Spark based framework for analysis A/B experiments

    Project mention: Show HN: Open-source project for scalable A/B statistical analysis | news.ycombinator.com | 2022-01-12
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2022-08-09.

Java Spark related posts

Index

What are some of the best open-source Spark projects in Java? This list will help you:

Project Stars
1 Deeplearning4j 12,568
2 Alluxio (formerly Tachyon) 5,816
3 Zeppelin 5,784
4 RoaringBitmap 2,724
5 elassandra 1,656
6 zingg 565
7 nessie 500
8 Sparkler 393
9 spark-bigquery-connector 233
10 rumble 175
11 lambda-arch 139
12 dead-salmon-brain 11
Find remote jobs at our new job board 99remotejobs.com. There are 3 new remote jobs listed recently.
Are you hiring? Post a new remote job listing for free.
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com