Our great sponsors
-
clustering-runs-plugin
Discontinued Compute clustering on your data in a visual, intuitive way with FiftyOne and Sklearn! [Moved to: https://github.com/jacobmarks/clustering-plugin]
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
clustering-plugin
Compute clustering on your data in a visual, intuitive way with FiftyOne and Sklearn!
-
CLIP
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
With all that background out of the way, let’s turn theory into practice and learn how to use clustering to structure our unstructured data. We’ll be leveraging two open-source machine learning libraries: scikit-learn, which comes pre-packaged with implementations of most common clustering algorithms, and fiftyone, which streamlines the management and visualization of unstructured data:
The FiftyOne Clustering Plugin makes our lives even easier. It provides the connective tissue between scikit-learn’s clustering algorithms and our images and wraps all of this in a simple UI within the FiftyOne App. We can install the plugin from the CLI:
fiftyone plugins download https://github.com/jacobmarks/clustering-plugin
We will also need two more libraries: OpenAI’s CLIP GitHub repo, enabling us to generate image features with the CLIP model, and the umap-learn library, which will let us apply a dimensionality reduction technique called Uniform Manifold Approximation and Projection (UMAP) to those features to visualize them in 2D:
Concept Modeling Techniques: the built-in concept modeling technique in this walkthrough uses GPT-4V and some light prompting to identify each cluster's core concept. This is but one way to approach an open-ended problem. Try using image captioning and topic modeling, or create your own technique!
Related posts
- Voxel51 Is Hiring AI Researchers and Scientists — What the New Open Science Positions Mean
- Efficiently Managing and Querying Visual Data With MongoDB Atlas Vector Search and FiftyOne
- FiftyOne Computer Vision Tips and Tricks - March 15, 2024
- Announcing FiftyOne 0.19 with Spaces, In-App Embeddings Visualization, Saved Views, and More!
- FiftyOne Computer Vision Tips and Tricks – Feb 10, 2023