Exploring 12M of the 2.3B Images Used to Train Stable Diffusion

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

laion-aesthetic-datasette

1 57 4.5 Python

Use Datasette to explore LAION improved_aesthetics_6plus training data used by Stable DIffusion

If anyone is interested in the technical details, the database itself is a 4GB SQLite file which we are hosting with Datasette running on Fly.
More details in our repo: https://github.com/simonw/laion-aesthetic-datasette
Search is provided by SQLite FTS5.

stable-diffusion

111 1,749 10.0 Jupyter Notebook
InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
stable-diffusion

382 65,389 0.0 Jupyter Notebook

A latent text-to-image diffusion model

I recommend looking into "transfer learning".
That's where you start with an existing large model, and train a new model on top of it by feeding in new images.
What's fascinating about transfer learning is that you don't need to give it a lot of new images, at all. Just a few hundred extras can create a model that's frighteningly accurate for tasks like image labeling.
This is pretty much how all AI models work today. Take a look at the Stable Diffusion model card: https://github.com/CompVis/stable-diffusion/blob/main/Stable...
They ran multiple training sessions with progressively smaller (and higher quality) images to get the final result.

clip-retrieval

11 2,124 7.9 Jupyter Notebook

Easily compute clip embeddings and build a clip retrieval system with them

Done https://github.com/rom1504/clip-retrieval/commit/53e3383f58b...
Using clip for searching is better than direct text indexing for a variety of reasons but here for example because it matches better what stable diffusion sees

gradio

115 28,730 9.9 Python

Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!
WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Are we at peak vector database?
8 projects | news.ycombinator.com | 25 Jan 2024
Thoughts on a "Text Generation CivitAI"
1 project | /r/Oobabooga | 19 Mar 2023
[D] Is BERT going to be obsolete by ChatGPT?
2 projects | /r/MachineLearning | 10 Mar 2023
Fox Fairy @ Diffusion Forest: Unreal Engine + Stable Diffusion
1 project | /r/sdforall | 1 Nov 2022
j'ai entraîné une IA à générer Éric Duhaime en clown !
2 projects | /r/Quebec | 27 Oct 2022

Exploring 12M of the 2.3B Images Used to Train Stable Diffusion

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Deep Learning Machine Learning semantic-search datasette Models
Post date: 30 Aug 2022

laion-aesthetic-datasette

stable-diffusion

InfluxDB

stable-diffusion

clip-retrieval

gradio

WorkOS

Related posts

Exploring 12M of the 2.3B Images Used to Train Stable Diffusion

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com Deep Learning Machine Learning semantic-search datasette Models Post date: 30 Aug 2022

laion-aesthetic-datasette

stable-diffusion

InfluxDB

stable-diffusion

clip-retrieval

gradio

WorkOS

Related posts

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Deep Learning Machine Learning semantic-search datasette Models
Post date: 30 Aug 2022