The idea maze for AI startups (2015)

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • coursera-startup

    Startup Engineering. Lecture Slides (june 2013)

  • stanford_alpaca

    Code and documentation to train Stanford's Alpaca models, and generate the data.

  • I think there's a new approach for “How do you get the data?” that wasn't available when this article was written in 2015. The new text and image generative models can now be used to synthesize training datasets.

    I was working on an typing autocorrect project and needed a corpus of "text messages". Most of the traditional NLP corpuses like those available through NLTK [0] aren't suitable. But it was easy to script ChatGPT to generate thousands of believable text messages by throwing random topics at it.

    Similarly, you can synthesize a training dataset by giving GPT the outputs/labels and asking it to generate a variety of inputs. For sentiment analysis... "Give me 1000 negative movie reviews" and "Now give me 1000 positive movie reviews".

    The Alpaca folks used GPT-3 to generate high-quality instruction-following datasets [1] based on a small set of human samples.

    Etc.

    [0] https://www.nltk.org/nltk_data/

    [1] https://crfm.stanford.edu/2023/03/13/alpaca.html

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts