Exploring 12M of the 2.3B Images Used to Train Stable Diffusion

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • laion-aesthetic-datasette

    Use Datasette to explore LAION improved_aesthetics_6plus training data used by Stable DIffusion

  • If anyone is interested in the technical details, the database itself is a 4GB SQLite file which we are hosting with Datasette running on Fly.

    More details in our repo: https://github.com/simonw/laion-aesthetic-datasette

    Search is provided by SQLite FTS5.

  • stable-diffusion

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • stable-diffusion

    A latent text-to-image diffusion model

  • I recommend looking into "transfer learning".

    That's where you start with an existing large model, and train a new model on top of it by feeding in new images.

    What's fascinating about transfer learning is that you don't need to give it a lot of new images, at all. Just a few hundred extras can create a model that's frighteningly accurate for tasks like image labeling.

    This is pretty much how all AI models work today. Take a look at the Stable Diffusion model card: https://github.com/CompVis/stable-diffusion/blob/main/Stable...

    They ran multiple training sessions with progressively smaller (and higher quality) images to get the final result.

  • clip-retrieval

    Easily compute clip embeddings and build a clip retrieval system with them

  • Done https://github.com/rom1504/clip-retrieval/commit/53e3383f58b...

    Using clip for searching is better than direct text indexing for a variety of reasons but here for example because it matches better what stable diffusion sees

  • gradio

    Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts