TensorFlow Datasets (TFDS): a collection of ready-to-use datasets

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • datasets

    TFDS is a collection of datasets ready to use with TensorFlow, Jax, ... (by tensorflow)

  • I tried Librispeech, a very common dataset for speech recognition, in both HF and TFDS.

    TFDS performed extremely bad.

    First it failed because the official hosting server only allows 5 simultaneous connections, and TFDS totally ignored that and makes up to 50 simultaneous downloads and that breaks. I wonder if anyone actually tested this?

    Then you need to have some computer with 30GB to do the preparation, which might fail on your computer. This is where I stopped. https://github.com/tensorflow/datasets/issues/3887. It might be fixed now but it took them 8 months to respond to my issue.

    On HF, it just worked. There was a smaller issue in how the dataset was split up but that is fixed now, and their response was very fast and great.

  • blackjack-basic-strategy

    A computer vision powered Blackjack basic strategy app powered by Roboflow.

  • For computer vision, there are 100k+ open source classification, object detection, and segmentation datasets available on Roboflow Universe: https://universe.roboflow.com

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts