huggingface logo


🤗 The largest hub of ready-to-use NLP datasets for ML models with fast, easy-to-use and efficient data manipulation tools (by huggingface)


Basic datasets repo stats
6 days ago

huggingface/datasets is an open source project licensed under Apache License 2.0 which is an OSI approved license.

Datasets Alternatives

Similar projects and alternatives to datasets
  • GitHub repo Home Assistant

    :house_with_garden: Open source home automation that puts local control and privacy first

  • GitHub repo edex-ui

    A cross-platform, customizable science fiction terminal emulator with advanced monitoring & touchscreen support.

  • GitHub repo first-contributions

    🚀✨ Help beginners to contribute to open source projects

  • GitHub repo NewPipe

    A libre lightweight streaming front-end for Android.

  • GitHub repo nnn

    n³ The unorthodox terminal file manager

  • GitHub repo Javascript

    A repository for All algorithms implemented in Javascript (for educational purposes only) (by TheAlgorithms)

  • GitHub repo react-native-firebase

    🔥 A well-tested feature-rich modular Firebase implementation for React Native. Supports both iOS & Android platforms for all Firebase services.

  • GitHub repo Blitz

    ⚡️The Fullstack React Framework — built on Next.js

  • GitHub repo Quarkus

    Quarkus: Supersonic Subatomic Java.

  • GitHub repo sentence-transformers

    Sentence Embeddings with BERT & XLNet

  • GitHub repo starter-workflows

    Accelerating new GitHub Actions workflows

  • GitHub repo nvidia-snatcher

    🤖 The world's easiest, most powerful stock checker [Moved to:]

  • GitHub repo Real_Time_Image_Animation

    The Project is real time application in opencv using first order model

  • GitHub repo quickjs

    Public repository of the QuickJS Javascript Engine. Pull requests are not accepted. Use the mailing list to submit patches.

  • GitHub repo cypress-realworld-app

    A payment application to demonstrate real-world usage of Cypress testing methods, patterns, and workflows.

  • GitHub repo frankmocap

    A Strong and Easy-to-use Single View 3D Hand+Body Pose Estimator

  • GitHub repo CppCon2020

    Slides and other materials from CppCon 2020

  • GitHub repo azure-sdk-for-js

    This repository is for active development of the Azure SDK for JavaScript (NodeJS & Browser). For consumers of the SDK we recommend visiting our public developer docs at or our versioned developer docs at

  • GitHub repo vision_blender

    A Blender addon for generating synthetic ground truth data for Computer Vision applications

  • GitHub repo NasNas

    An intuitive and user friendly 2D game framework for C++

NOTE: The number of mentions on this list indicates mentions on common posts. Hence, a higher number means a better datasets alternative or higher similarity.


Posts where datasets has been mentioned. We have used some of these posts to build our list of alternatives and similar projects - the last one was on 2021-01-28.
  • Build an Embeddings index with Hugging Face Datasets | 2021-01-28
    This article shows how txtai can index and search with Hugging Face's Datasets library. Datasets opens access to a large and growing list of publicly available datasets. Datasets has functionality to select, transform and filter data stored in each dataset.
  • [P] 611 text datasets in 467 languages in the new v1.2 release of HuggingFace datasets library
    There will be 13 more bytthe end of this week, from Microsoft CodeXGlue, I had not the time to fix my PR earlier : .
  • Contributors | 2020-12-31
  • Weekly Developer Roundup #16 - Sun Oct 04 2020 | 2020-10-03
    huggingface/datasets (Python): 🤗 Fast, efficient, open-access datasets and evaluation metrics for Natural Language Processing and more in PyTorch, TensorFlow, NumPy and Pandas