Jupyter Notebook Spark

Open-source Jupyter Notebook projects categorized as Spark

Top 17 Jupyter Notebook Spark Projects

  • data-engineering-zoomcamp

    Free Data Engineering course!

  • Project mention: Data Engineering Zoomcamp Week 6 - using redpanda 1 | dev.to | 2024-04-09

    References: Data engineering zoomcamp week 6 course and homework notes: https://github.com/DataTalksClub/data-engineering-zoomcamp/tree/main/cohorts/2024/06-streaming

  • H2O

    H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

  • Project mention: Really struggling with open source models | /r/LocalLLaMA | 2023-07-12

    I would use H20 if I were you. You can try out LLMs with a nice GUI. Unless you have some familiarity with the tools needed to run these projects, it can be frustrating. https://h2o.ai/

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • HELK

    The Hunting ELK

  • JustEnoughScalaForSpark

    A tutorial on the most important features and idioms of Scala that you need to use Spark's Scala APIs.

  • Data-Engineering-Projects

    Personal Data Engineering Projects

  • Project mention: Pitanje za data engineering? | /r/programiranje | 2023-06-30
  • ngods-stocks

    New Generation Opensource Data Stack Demo

  • rikai

    Parquet-based ML data format optimized for working with unstructured data

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • lasagna

    A Docker Compose template that builds a interactive development environment for PySpark with Jupyter Lab, MinIO as object storage, Hive Metastore, Trino and Kafka

  • Project mention: FLaNK Stack Weekly for 20 Nov 2023 | dev.to | 2023-11-20
  • SDE

    Scalytics Connect development environment, pre-build (by scalytics)

  • ghcn-d

    Data Pipeline from the Global Historical Climatology Network DataSet

  • amazon-emr-with-delta-lake

    Amazon EMR Notebook to show how to read from and write to Delta tables with Amazon EMR

  • pyspark_nlp_workshop

    Instructions and code for the workshop "From Big Data to NLP Insights: Unlocking the Power of PySpark and Spark NLP"

  • Project mention: PySpark for NLP workshop – Jupyter notebooks and instructions | news.ycombinator.com | 2023-05-14
  • project-atlas-sao-paulo

    A project for the development of rich geospatial data from the city of São Paulo for use in Machine Learning models.

  • workshop-introduction-to-machine-learning

    Come ready to discover the goals and approaches of machine learning, and how to build effective algorithms and solutions!

  • synapse-azure-data-explorer-101

    Getting started with Azure Synapse and Azure Data Explorer

  • udacity_bike_share_datalake_project

    Azure Data Lake

  • Project mention: Unveiling the Azure Data Lake for Bike Share Data Analytics | dev.to | 2023-10-11

    You can find the code related to this project in my GitHub repository.

  • dracula

    a brief analysis to the most common words in Dracula, by Bram Stoker (by geazi-anc)

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Jupyter Notebook Spark related posts

Index

What are some of the best open-source Spark projects in Jupyter Notebook? This list will help you:

Project Stars
1 data-engineering-zoomcamp 22,446
2 H2O 6,730
3 HELK 3,659
4 JustEnoughScalaForSpark 673
5 Data-Engineering-Projects 637
6 ngods-stocks 354
7 rikai 135
8 lasagna 27
9 SDE 22
10 ghcn-d 21
11 amazon-emr-with-delta-lake 17
12 pyspark_nlp_workshop 12
13 project-atlas-sao-paulo 9
14 workshop-introduction-to-machine-learning 7
15 synapse-azure-data-explorer-101 4
16 udacity_bike_share_datalake_project 0
17 dracula 0

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com