Jupyter Notebook Data

Open-source Jupyter Notebook projects categorized as Data

Top 20 Jupyter Notebook Data Projects

  • data

    Data and code behind the articles and graphics at FiveThirtyEight

    Project mention: [Effortpost] Advanced stats on which players are contributing the most to the Heat's playoff run. | reddit.com/r/heat | 2023-05-24

    To answer these questions I decided to look at 538’s RAPTOR ratings. RAPTOR uses player tracking data to estimate how much each player contributes on the offensive and defensive ends. The total RAPTOR score should be something like the “number of points a player contributes to his team’s offense and defense per 100 possessions, relative to a league-average player.” Higher is better, best during the regular season has been Nikola Jokic at +14. You can read more about it here or play with an interactive tool on their website here. I don’t really care about the details of why it’s a good statistic, but it seems pretty helpful and most importantly for my purposes you can download the data here for free.

  • datasets

    🎁 4,800,000+ Unsplash images made available for research and machine learning (by unsplash)

    Project mention: Where can I get lots of clean open source data? | reddit.com/r/DataHoarder | 2022-12-14

    ONLYOFFICE Docs — document collaboration in your environment. Powerful document editing and collaboration in your app or environment. Ultimate security, API and 30+ ready connectors, SaaS or on-premises

  • quilt

    Quilt is a data mesh for connecting people with actionable data

  • spyql

    Query data on the command line with SQL-like SELECTs powered by Python expressions

    Project mention: Command-line data analytics made easy with SPyQL | dev.to | 2022-11-06

    SPyQL documentation: spyql.readthedocs.io

  • pdpipe

    Easy pipelines for pandas DataFrames.

  • Reactors

    🌱 Join a community of developers at Microsoft Reactor and connect with people, skills, and technology to build your career or personal learning. We offer free livestreams, on-demand content, and hybrid/in-person events daily around the world. Access our projects and code here.

    Project mention: Michael Mumbauer speaks to a packed crowd at Microsoft Reactor SF during GDC2023 talking all things Ashfall - the multimedia AAA IP utilizing Hedera to unleash the full potential of web3 entertainment. I’ll past video when available. | reddit.com/r/Hedera | 2023-03-22
  • ydata-quality

    Data Quality assessment with one line of code

  • InfluxDB

    Access the most powerful time series database as a service. Ingest, store, & analyze all types of time series data in a fully-managed, purpose-built database. Keep data forever with low-cost storage and superior data compression.

  • awesome-data-centric-ai

    Open-Source Software, Tutorials, and Research on Data-Centric AI 🤖

    Project mention: [Q] How to generate synthetic dataset for anomaly detection? | reddit.com/r/statistics | 2023-05-08

    Maybe you can use a synthetic data generator and use your current dataset as input? I believe there are a lot of GAN-based models for this purpose out there. The ones listed on https://github.com/Data-Centric-AI-Community/awesome-data-centric-ai are mostly focused on structured data, but I'm sure there are similar packages for images.

  • uawardata

    The data behind uawardata.com

    Project mention: I found an intteractive map locating all Russian military forces. Can anyone verify how up-to-date it is? | reddit.com/r/UkrainianConflict | 2022-12-26

    They provide dates with their data, but the data only goes up to September. The raw data is also available on Github, if you need that as well: https://github.com/simonhuwiler/uawardata


    Jupyter Notebooks and Data Sets for Pandas Library (by TirendazAcademy)

    Project mention: The Machine Learning Project Lifecycle | reddit.com/r/learnmachinelearning | 2022-11-05

    ✨ Thanks for reading 😀 Follow me on YouTube, Twitter, Instagram, Medium, Tiktok

  • German-NER-BERT

    German NER on Legal Data using BERT


    A simple CLIP based project for combining images from multiple datasets.

  • ACC_Data_2

    A tool for recording telemetry from Assetto Corsa Competitzione (on PC) for post-session analysis

    Project mention: How to record FFB signal? | reddit.com/r/ACCompetizione | 2022-12-16

    I coded this in Python to record shared memory data into a CSV - with some modification it should do what you need: https://github.com/ThomasBisset/ACC_Data_2

  • KunOnYomiFrequency

    The most common possible readings of the most frequently used Kanji characters.

  • search-engine

    Full stack search-engine created from youtube videos obtained using "web-scraping"

    Project mention: Buscador de vídeos con OpenSearch y React | Parte 3 | Limpieza y almacenamiento de los datos | dev.to | 2022-11-18
  • walmart-stores-coffee-analysis

    Walmart Coffee Exploratory Data Analysis. Data Extracted with SerpApi 🧡

    Project mention: Web scraping Walmart Search with Nodejs | dev.to | 2023-01-26

    📌Note: Also see SerpApi Python demo project of extracting data from 500 Walmart stores and analyzing extracted data if you want to know more about scraping Walmart.

  • global-temp-change-animation

    This animated map shows the change in surface temperature around the world from 1970 to 2021, based on data from Kaggle.

    Project mention: [OC] Animated Map of Global Temperature Changes from 1970 to 2021 | reddit.com/r/dataisbeautiful | 2023-02-23

    GitHub Repo: DovarFalcone

  • DataScienceProjects

  • russo-ukraine-war-prediction-losses

    Highlights rusian losses with predictions based on historic data from Ministry Defence of Ukraine 🐱‍👤

    Project mention: [OC] 1 Year Russian Personnel Prediction Losses in Russo-Ukraine War | reddit.com/r/dataisbeautiful | 2022-10-13

    There's also a dataset: https://www.kaggle.com/datasets/dimitryzub/russian-losses-in-russia-ukraine-war or https://github.com/dimitryzub/russo-ukraine-war-prediction-losses

  • onpe2021

    2021 presidential election's data extractor. This script collect all the data from official ONPE page.

  • CodiumAI

    TestGPT | Generating meaningful tests for busy devs. Get non-trivial tests (and trivial, too!) suggested right inside your IDE, so you can code smart, create more value, and stay confident when you push.

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2023-05-24.

Jupyter Notebook Data related posts


What are some of the best open-source Data projects in Jupyter Notebook? This list will help you:

Project Stars
1 data 16,212
2 datasets 2,082
3 quilt 1,248
4 spyql 877
5 pdpipe 708
6 Reactors 469
7 ydata-quality 362
8 awesome-data-centric-ai 221
9 uawardata 109
11 German-NER-BERT 5
13 ACC_Data_2 2
14 KunOnYomiFrequency 1
15 search-engine 1
16 walmart-stores-coffee-analysis 1
17 global-temp-change-animation 0
18 DataScienceProjects 0
19 russo-ukraine-war-prediction-losses 0
20 onpe2021 0
Static code analysis for 29 languages.
Your projects are multi-language. So is SonarQube analysis. Find Bugs, Vulnerabilities, Security Hotspots, and Code Smells so you can release quality code every time. Get started analyzing your projects today for free.