Top 12 Java Spark Projects
Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learning using automatic differentiation.Project mention: Data Science Competition | dev.to | 2022-03-25
Alluxio, data orchestration for analytics and machine learning in the cloud
Static code analysis for 29 languages.. Your projects are multi-language. So is SonarQube analysis. Find Bugs, Vulnerabilities, Security Hotspots, and Code Smells so you can release quality code every time. Get started analyzing your projects today for free.
Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.Project mention: Visualization using Pyspark Dataframe | reddit.com/r/dataengineering | 2022-05-14
Have you tried Apache Zepellin I remember that you can pretty print spark dataframes directly on it with z.show(df)
A better compressed bitset in JavaProject mention: Negative Incentives in Academic Research | news.ycombinator.com | 2022-07-22
Sidetracking a bit the conversation. What a coincidence that the author (Lemire) is also represented on Today's #1 "Ask HN: What are some cool but obscure data structures you know about?" as he is the main contributor of RoaringBitmap https://github.com/RoaringBitmap/RoaringBitmap and one of the main authors of the data structure.
Elassandra = Elasticsearch + Apache CassandraProject mention: Using Elastic Search Cluster with Cassandra Cluster. | reddit.com/r/elasticsearch | 2022-05-02
Scalable entity resolution, data mastering and deduplication using MLProject mention: Merging datasets | reddit.com/r/dataengineering | 2022-07-11
Nessie: Transactional Catalog for Data Lakes with Git-like semanticsProject mention: 5 Reasons Your Data Lakehouse should Embrace Dremio Cloud | dev.to | 2022-08-09
The Dremio Sonar query engine can query your data where it exists whether it's AWS Glue, S3, Nessie Catalogs, MySQL, Postgres, RedShift and an ever growing list of sources.
Less time debugging, more time building. Scout APM allows you to find and fix performance issues with no hassle. Now with error monitoring and external services monitoring, Scout is a developer's best friend when it comes to application development.
Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
BigQuery data source for Apache Spark: Read data from BigQuery into DataFrames, write DataFrames into BigQuery tables.Project mention: Completed my first Data Engineering project with Kafka, Spark, GCP, Airflow, dbt, Terraform, Docker and more! | reddit.com/r/dataengineering | 2022-04-02
⛈️ RumbleDB 1.19.0 "Tipuana Tipu" 🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more (by RumbleDB)Project mention: RumbleDB: Query with ease a lot of different nested, heterogeneous data formats | news.ycombinator.com | 2021-12-01
A full big data pipeline (Lambda Architecture) with Spark, Kafka, HDFS and Cassandra.
Apache Spark based framework for analysis A/B experimentsProject mention: Show HN: Open-source project for scalable A/B statistical analysis | news.ycombinator.com | 2022-01-12
Java Spark related posts
Project Nessie: Transactional Catalog for Data Lakes with Git-Like Semantics
1 project | news.ycombinator.com | 23 Jun 2022
Introduction to The World of Data - (OLTP, OLAP, Data Warehouses, Data Lakes and more)
2 projects | dev.to | 20 Jun 2022
Match over 1 GB of data with inconsistent names
3 projects | reddit.com/r/dataengineering | 9 Nov 2021
BI Application in Golang.
2 projects | reddit.com/r/golang | 7 May 2021
Lambda Architecture: How to Build a Big Data Pipeline
1 project | dev.to | 5 Mar 2021
What are some of the best open-source Spark projects in Java? This list will help you:
|2||Alluxio (formerly Tachyon)||5,816|
Are you hiring? Post a new remote job listing for free.