Activeloop Hub
DISCONTINUED
caer
Our great sponsors
Activeloop Hub | caer | |
---|---|---|
31 | 8 | |
4,807 | 743 | |
- | - | |
9.9 | 0.0 | |
over 1 year ago | 6 months ago | |
Python | Python | |
Mozilla Public License 2.0 | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Activeloop Hub
-
[D] NLP has HuggingFace, what does Computer Vision have?
u/Remote_Cancel_7977 we just launched 100+ computer vision datasets via Activeloop Hub yesterday on r/ML (#1 post for the day!). Note: we do not intend to compete with HuggingFace (we're building the database for AI). Accessing computer vision datasets via Hub is much faster than via HuggingFace though, according to some third-party benchmarks. :)
-
[N] [P] Access 100+ image, video & audio datasets in seconds with one line of code & stream them while training ML models with Activeloop Hub (more at docs.activeloop.ai, description & links in the comments below)
u/gopietz good question. htype="class_label" will work, but querying doesn't support multi-dimensional labels yet. Would you mind opening an issue requesting that feature?
We've recently added a Huggingface integration that allows ingestion of HuggingFace datasets.
-
[P] Database for AI: Visualize, version-control & explore image, video and audio datasets
Hub, our open-source package, lets you stream datasets while training to PyTorch/TensorFlow. Check out how we achieved 95% GPU utilization while training on ImageNet at 50% less cost. We're building the Database for AI, with everything it should contain. If there's an adjacent feature that would make it more useful for your workflow, do let us know!
Our early users love the tool and I hope you'll love it too. We have many more features other than visualization on the roadmap (the current feature list includes querying, version control UI, and integrates through our open-source package Hub (dataset format for AI) with TensorFlow, PyTorch, Sagemaker, other tools on the roadmap.
Please take a look at our open-source dataset format https://github.com/activeloopai/hub and a tutorial on htypes https://docs.activeloop.ai/how-hub-works/visualization-and-htype
The platform allows to: - Inspect the data with all its bounding boxes, masks, etc, and have important stats such as distribution of the labels (adding more stuff in the future to fight bias and improve data quality). - Query datasets to create new, highly specific ones - Version control datasets (while visualizing the changes). I'm confident that if you've ever worked on iteratively improving your models, dataset versioning is probably something you've done. - Stream computer vision datasets while training in PyTorch/Tensorflow via Hub, our open source package (we might add an even more straightforward way to the UI). - For larger organizations access management is important, and we do take care of that.
The visualization interfaces with our open-source dataset format for AI, enabling workflows such as querying/filtering to create datasets/inspect subsamples, tracking changes to the data with data version control visualization (e.g. cross-referencing if the transformations applied had intended effects), and will have integrations with other tools (e.g. experiment tracking, labelling) very soon.
Yes, we're not entirely relevant for your use case, especially if the data is not that big/complex, and benefits that you'd get from switching to Hub format are not as pronounced in case of text as they are in case of computer vision datasets (actually, we still have a couple of diehard NLP community members, but they have ridiculously big text datasets). I presume your university system doesn't use unstructured data like videos/images/audio, either, so our product wouldn't be very helpful in that regard. I do wish you tons of luck and patience though (>10ˆ6?! good Lord...)
-
The hand-picked selection of the best Python libraries released in 2021
Hub.
caer
We haven't tracked posts mentioning caer yet.
Tracking mentions began in Dec 2020.
What are some alternatives?
dvc - 🦉 ML Experiments and Data Management with Git
fiftyone - The open-source tool for building high-quality datasets and computer vision models
img2table - img2table is a table identification and extraction Python Library for PDF and images, based on OpenCV image processing
opencv - Haskell binding to OpenCV-3.x
Single-Image-Dehazing-Python - python implementation of the paper: "Efficient Image Dehazing with Boundary Constraint and Contextual Regularization"
instant-ngp - Instant neural graphics primitives: lightning fast NeRF and more
petastorm - Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
CKAN - CKAN is an open-source DMS (data management system) for powering data hubs and data portals. CKAN makes it easy to publish, share and use data. It powers catalog.data.gov, open.canada.ca/data, data.humdata.org among many other sites.
datasets - TFDS is a collection of datasets ready to use with TensorFlow, Jax, ...
moviepy - Video editing with Python
RobustVideoMatting - Robust Video Matting in PyTorch, TensorFlow, TensorFlow.js, ONNX, CoreML!
TileDB - The Universal Storage Engine