Jupyter Notebook Dataset

Open-source Jupyter Notebook projects categorized as Dataset

Top 23 Jupyter Notebook Dataset Projects

  1. covid-chestxray-dataset

    We are building an open database of COVID-19 cases with chest X-ray or CT images.

  2. InfluxDB

    InfluxDB โ€“ Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.

    InfluxDB logo
  3. whylogs

    An open-source data logging library for machine learning models and data pipelines. ๐Ÿ“š Provides visibility into data quality & model performance over time. ๐Ÿ›ก๏ธ Supports privacy-preserving data collection, ensuring safety & robustness. ๐Ÿ“ˆ

  4. datasets

    ๐ŸŽ 6,500,000+ Unsplash images made available for research and machine learning (by unsplash)

  5. fma

    FMA: A Dataset For Music Analysis

  6. clusterdata

    cluster data collected from production clusters in Alibaba for cluster management research

  7. raccoon_dataset

    The dataset is used to train my own raccoon detector and I blogged about it on Medium

  8. torchxrayvision

    TorchXRayVision: A library of chest X-ray datasets and models. Classifiers, segmentation, and autoencoders.

  9. Stream

    Stream - Scalable APIs for Chat, Feeds, Moderation, & Video. Stream helps developers build engaging apps that scale to millions with performant and flexible Chat, Feeds, Moderation, and Video APIs and SDKs powered by a global edge network and enterprise-grade infrastructure.

    Stream logo
  10. ThoughtSource

    A central, open resource for data and tools related to chain-of-thought reasoning in large language models. Developed @ Samwald research group: https://samwald.info/

  11. hate-speech-and-offensive-language

    Repository for the paper "Automated Hate Speech Detection and the Problem of Offensive Language", ICWSM 2017

  12. OpenAI-CLIP

    Simple implementation of OpenAI CLIP model in PyTorch.

  13. TACO

    ๐ŸŒฎ Trash Annotations in Context Dataset Toolkit (by pedropro)

  14. SKAB

    SKAB - Skoltech Anomaly Benchmark. Time-series data for evaluating Anomaly Detection algorithms.

  15. Awesome_Satellite_Benchmark_Datasets

    Supplementary material for our paper "THERE IS NO DATA LIKE MORE DATA" is provided.

  16. goodreads

    code samples for the goodreads datasets (by MengtingWan)

  17. roboflow-100-benchmark

    Code for replicating Roboflow 100 benchmark results and programmatically downloading benchmark datasets

  18. alis

    [ICCV 2021] Aligning Latent and Image Spaces to Connect the Unconnectable (by universome)

  19. covid19za

    Coronavirus COVID-19 (2019-nCoV) Data Repository and Dashboard for South Africa

  20. ImageNetV2

    A new test set for ImageNet

  21. mnist1d

    A 1D analogue of the MNIST dataset for measuring spatial biases and answering Science of Deep Learning questions.

  22. Tegridy-MIDI-Dataset

    Tegridy MIDI Dataset for precise and effective Music AI models creation.

  23. medmcqa

    A large-scale (194k), Multiple-Choice Question Answering (MCQA) dataset designed to address realworld medical entrance exam questions.

  24. openbrewerydb

    ๐Ÿป An open-source dataset of breweries, cideries, brewpubs, and bottleshops.

  25. clip-italian

    CLIP (Contrastive Languageโ€“Image Pre-training) for Italian

  26. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Jupyter Notebook Dataset discussion

Log in or Post with

Jupyter Notebook Dataset related posts

  • Simple Implementation of OpenAI Clip (Tutorial)

    1 project | news.ycombinator.com | 21 Feb 2024
  • SKAB: NEW Data - star count:238.0

    1 project | /r/algoprojects | 25 Sep 2023
  • SKAB: NEW Data - star count:238.0

    1 project | /r/algoprojects | 24 Sep 2023
  • SKAB: NEW Data - star count:238.0

    1 project | /r/algoprojects | 23 Sep 2023
  • SKAB: NEW Data - star count:238.0

    1 project | /r/algoprojects | 19 Sep 2023
  • Update from Waymo spokesperson on the dog that was killed by a Waymo ADV

    1 project | /r/SelfDrivingCars | 13 Jun 2023
  • [P] Fine-tuning LLaMA on TheVault by AI4Code

    2 projects | /r/LocalLLaMA | 30 May 2023
  • A note from our sponsor - Stream
    getstream.io | 18 Jul 2025
    Stream helps developers build engaging apps that scale to millions with performant and flexible Chat, Feeds, Moderation, and Video APIs and SDKs powered by a global edge network and enterprise-grade infrastructure. Learn more โ†’

Index

What are some of the best open-source Dataset projects in Jupyter Notebook? This list will help you:

# Project Stars
1 covid-chestxray-dataset 3,033
2 whylogs 2,732
3 datasets 2,571
4 fma 2,423
5 clusterdata 1,788
6 raccoon_dataset 1,272
7 torchxrayvision 1,041
8 ThoughtSource 982
9 hate-speech-and-offensive-language 813
10 OpenAI-CLIP 689
11 TACO 655
12 SKAB 361
13 Awesome_Satellite_Benchmark_Datasets 354
14 goodreads 281
15 roboflow-100-benchmark 271
16 alis 258
17 covid19za 254
18 ImageNetV2 252
19 mnist1d 226
20 Tegridy-MIDI-Dataset 218
21 medmcqa 211
22 openbrewerydb 193
23 clip-italian 186

Sponsored
InfluxDB โ€“ Built for High-Performance Time Series Workloads
InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
www.influxdata.com