Crawling@Home: Help Build The Worlds Largest Image-Text Pair Dataset!

This page summarizes the projects mentioned and recommended in the original post on /r/DataHoarder

InfluxDB – Built for High-Performance Time Series Workloads
InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  1. DALLE-pytorch

    Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

    Since then, several efforts have been organized to replicate DALL-E. People organized initially around this awesome dalle replication repository https://github.com/lucidrains/DALLE-pytorch with some nice results that can be seen in the readme. More recently as part of an huggingface events, new results have been achieved (see https://wandb.ai/dalle-mini/dalle-mini/reports/DALL-E-mini--Vmlldzo4NjIxODA ) and an online demo is now available https://huggingface.co/spaces/flax-community/dalle-mini

  2. InfluxDB

    InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.

    InfluxDB logo
  3. crawlingathome-worker

  4. DALLE-datasets

    This is a summary of easily available datasets for generalized DALLE-pytorch training.

    A large part of the results that can be achieved with such models is thanks to data. Large amount of data. Today the largest open dataset for (image, text) pairs are in the order of 10M (see https://github.com/robvanvolt/DALLE-datasets ), which is enough to train okay models, but not enough to reach the best performance. Having a public dataset with hundred of millions of pairs could help a lot to build these image+text models.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Thoughts on AI image generators from text

    1 project | /r/conspiracy | 9 Aug 2022
  • [P] DALL·E Mini & Mega demo and production API

    1 project | /r/MachineLearning | 12 Jul 2022
  • Google Parti open source implementation

    1 project | news.ycombinator.com | 27 Jun 2022
  • New text-to-image network from Google beats DALL-E

    13 projects | news.ycombinator.com | 23 May 2022
  • [Project] DALL-3 - generate better images with fewer tokens through clip guided diffusion

    3 projects | /r/MachineLearning | 4 Dec 2021

Did you know that Python is
the 2nd most popular programming language
based on number of references?