Jupyter Notebook Spark

Open-source Jupyter Notebook projects categorized as Spark

Top 11 Jupyter Notebook Spark Projects

  • data-engineering-zoomcamp

    Free Data Engineering course!

    Project mention: Magic: The Gathering dashboard | First complete DE project ever | Feedback welcome | reddit.com/r/dataengineering | 2023-03-23

    I am fairly new to DE, learning Python since December 2022, and coming from a non-tech background. I took part in the DataTalksClub Zoomcamp. I started using these tools used in the project in January 2023.

  • H2O

    H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

    Project mention: Top 10+ OpenAI Alternatives | dev.to | 2023-02-13


  • InfluxDB

    Access the most powerful time series database as a service. Ingest, store, & analyze all types of time series data in a fully-managed, purpose-built database. Keep data forever with low-cost storage and superior data compression.

  • BigDL

    Fast, distributed, secure AI for Big Data

  • HELK

    The Hunting ELK

    Project mention: Kali Linux 2023.1 introduces 'Purple' distro for defensive security | reddit.com/r/netsec | 2023-03-14

    Utilizing that api and juniper notebooks is exactly why Hunting Elk is the way it from my understanding.

  • JustEnoughScalaForSpark

    A tutorial on the most important features and idioms of Scala that you need to use Spark's Scala APIs.

    Project mention: Which tutorial to learn functional programming without going in depth ? | reddit.com/r/scala | 2023-02-09

    - https://github.com/deanwampler/JustEnoughScalaForSpark

  • Data-Engineering-Projects

    Personal Data Engineering Projects

    Project mention: ✨ 5 Open Source Data Engineering Projects 🔥 | reddit.com/r/dataengineering | 2022-10-19

    5️⃣ Data Engineering Projects

  • ngods-stocks

    New Generation Opensource Data Stack Demo

    Project mention: I'm way over my head | reddit.com/r/dataengineering | 2023-03-03

    I've worked for 3-4 years in positions where I helped structure ETLs, DWs and alike. However, I'm now on the cusp of being hired to help structure the area in a big investment fund here, helping the research area have an easier time focusing on their models. My previous experience led me to grasp DBT, SQL, and most of my experience came from using a Microsoft stack with SSIS, Analysis Services and the like. I'm feeling wayyyy over my head to start building this, and the multitude of possible stacks make me very afraid that I might overengineer this, and I will initially be alone in the area. What do I do? Fake it till I make it? I never lied in my resume, so it's not like they expect a senior with plenty of experience but still... I read this: https://github.com/zsvoboda/ngods-stocks And it seems like a good starter, albeit overly complex for our use case. I could use suggestions, people to talk to, etc. Please help

  • SonarQube

    Static code analysis for 29 languages.. Your projects are multi-language. So is SonarQube analysis. Find Bugs, Vulnerabilities, Security Hotspots, and Code Smells so you can release quality code every time. Get started analyzing your projects today for free.

  • rikai

    Parquet-based ML data format optimized for working with unstructured data

  • amazon-emr-with-delta-lake

    Amazon EMR Notebook to show how to read from and write to Delta tables with Amazon EMR

    Project mention: Datalake to delta lake | reddit.com/r/dataengineering | 2022-05-02
  • project-atlas-sao-paulo

    A project for the development of rich geospatial data from the city of São Paulo for use in Machine Learning models.

  • synapse-azure-data-explorer-101

    Getting started with Azure Synapse and Azure Data Explorer

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2023-03-23.

Jupyter Notebook Spark related posts


What are some of the best open-source Spark projects in Jupyter Notebook? This list will help you:

Project Stars
1 data-engineering-zoomcamp 12,946
2 H2O 6,190
3 BigDL 4,175
4 HELK 3,429
5 JustEnoughScalaForSpark 660
6 Data-Engineering-Projects 392
7 ngods-stocks 256
8 rikai 128
9 amazon-emr-with-delta-lake 15
10 project-atlas-sao-paulo 7
11 synapse-azure-data-explorer-101 3
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives