SaaSHub helps you find the best software and product alternatives Learn more →
Python data-curation Projects
-
fastdup
fastdup is a powerful free tool designed to rapidly extract valuable insights from your image & video datasets. Assisting you to increase your dataset images & labels quality and reduce your data operations costs at an unparalleled scale.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
Visualizing your dataset (especially large ones) in a low-dimensional embedding space can tell you a lot about the patterns and clusters in your dataset.
We recently release a notebook showing how you can visualize your dataset using DINOv2 models by running it on your CPU.
Yes! No GPUs needed.
We used it to find clusters of similar images, duplicates, and outliers in a subset of the LAION dataset
Try it on your own dataset:
Colab notebook - https://colab.research.google.com/github/visual-layer/fastdup/blob/main/examples/dinov2_notebook.ipynb
GitHub repo - https://github.com/visual-layer/fastdup
Python data-curation related posts
-
Visualize your dataset using DINOv2 embedding
-
Visualize your dataset using DINOv2 embedding
-
[R][P] How to extract feature vectors of large datasets using DINOv2 on CPU
-
Find image duplicates and outliers – A free, scalable, efficient tool
-
Find image duplicates and outliers – A free, scalable, efficient tool
-
How can we match images in our database?
-
[R] We found nearly half a billion duplicated images on LAION-2B-en.
-
A note from our sponsor - SaaSHub
www.saashub.com | 1 May 2024
Index
Project | Stars | |
---|---|---|
1 | fastdup | 1,403 |
Sponsored