Collect, organize, and act on massive volumes of high-resolution data to power real-time intelligent systems. Learn more →
Datasets Alternatives
Similar projects and alternatives to datasets
-
-
Judoscale
Save 47% on cloud hosting with autoscaling that just works. Judoscale integrates with Django, FastAPI, Celery, and RQ to make autoscaling easy and reliable. Save big, and say goodbye to request timeouts and backed-up task queues.
-
-
Home Assistant
:house_with_garden: Open source home automation that puts local control and privacy first.
-
-
-
-
-
InfluxDB
InfluxDB high-performance time series database. Collect, organize, and act on massive volumes of high-resolution data to power real-time intelligent systems.
-
-
-
minio
MinIO is a high-performance, S3 compatible object store, open sourced under GNU AGPLv3 license.
-
MindsDB
AI's query engine - Platform for building AI that can learn and answer questions over large scale federated data.
-
-
-
-
Silk.NET
The high-speed OpenGL, OpenCL, OpenAL, OpenXR, GLFW, SDL, Vulkan, Assimp, WebGPU, and DirectX bindings library your mother warned you about.
-
Kedro
Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
-
tidb
TiDB - the open-source, cloud-native, distributed SQL database designed for modern applications.
-
-
-
-
CodeRabbit
CodeRabbit: AI Code Reviews for Developers. Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
datasets discussion
datasets reviews and mentions
-
20 Open Source Tools I Recommend to Build, Share, and Run AI Projects
Datasets library repository for accessing and sharing datasets with the community.
-
Go is my hammer, and everything is a nail
This is my (current) favorite list comprehension: https://github.com/huggingface/datasets/blob/871eabc7b23c27d... Someone was feeling awfully clever that day. (Not that I'm not occasionally guilty myself.)
- 🐍🐍 23 issues to grow yourself as an exceptional open-source Python expert 🧑💻 🥇
- Mastering ROUGE Matrix: Your Guide to Large Language Model Evaluation for Summarization with Examples
-
How to Train Large Models on Many GPUs?
https://github.com/huggingface/datasets
https://github.com/huggingface/transformers
-
[D] Can we use Ray for distributed training on vertex ai ? Can someone provide me examples for the same ? Also which dataframe libraries you guys used for training machine learning models on huge datasets (100 gb+) (because pandas can't handle huge data).
https://huggingface.co/docs/datasets backed with an Arrow file or buffer
- Need help with a data science project
-
Is there a text evaluation metric that does not need reference text?
I'm looking for an automatic evaluation metric that can score the first text higher (since it's more grammatically correct/better for other reasons). All the metrics for NLG I found require some reference text to match the generated text with, which I don't have.
-
FauxPilot – an open-source GitHub Copilot server
And then pass that my_code.json as the dataset name.
[1] https://github.com/huggingface/datasets
-
Hugging Face Introduces ‘Datasets’: A Lightweight Community Library For Natural Language Processing (NLP)
Code for https://arxiv.org/abs/2109.02846 found: https://github.com/huggingface/datasets
-
A note from our sponsor - InfluxDB
influxdata.com | 25 Apr 2025
Stats
huggingface/datasets is an open source project licensed under Apache License 2.0 which is an OSI approved license.
The primary programming language of datasets is Python.