GPT Neo: open-source GPT-3-like model with pretrained weights available

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • kiri

    Backprop makes it simple to use, finetune, and deploy state-of-the-art ML models. (by kiri-ai)

    You might get some really promising results with finetuning.

    If anything, you could build writing assistance that almost automates responses.

    I've been co-authoring a library that lets you finetune such models in a single line of code.

    https://github.com/backprop-ai/backprop

    In specific the text generation finetuning example should be what you are looking for: https://github.com/backprop-ai/backprop/blob/main/examples/F...

    Hope this helps, happy to chat more about it. Pretty curious about the results.

  • gpt-neo

    Discontinued An implementation of model parallel GPT-2 and GPT-3-style models using the mesh-tensorflow library.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

  • haystack

    :mag: LLM orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.

    also, for some quick and simple Q&A system. Haystack https://github.com/deepset-ai/haystack (essentially dense vector similarity on Elastic Search) looks pretty promising and supports whole pipeline.

  • gpt-neox

    An implementation of model parallel autoregressive transformers on GPUs, based on the DeepSpeed library.

    GPT-NeoX, which is a model from the same group but using GPUs instead of TPUs, uses techniques from DeepSpeed:

    https://github.com/EleutherAI/gpt-neox/

  • lm-evaluation-harness

    A framework for few-shot evaluation of language models.

    We’ve added a table with some evaluation scores to the GitHub repo, and you can see a comparison between our scores, GPT-2, and GPT-3 here: https://twitter.com/BlancheMinerva/status/137399189661642752...

    tl;dr we are doing pretty much exactly as well as we expected on LAMBADA and WikiText. Results on more sophisticated tasks will take some time, but HuggingFace is currently working on implementing our model in the transformers library and when they do so we can easily run a lot of analyses very quickly.

    We actually built an evaluation suite that integrates with HF, but interfacing with the MTF code that GPT-Neo was written in was too much of a pain in the ass because Mesh TensorFlow is the worst. https://github.com/EleutherAI/lm-evaluation-harness

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts