Stuff we figured out about AI in 2023

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • llama2.c

    Inference Llama 2 in one file of pure C

  • FOr inference, less than 1KLOC of pure, dependency-free C is enough (if you include the tokenizer and command line parsing)[1]. This was a non-obvious fact for me, in principle, you could run a modern LLM 20 years ago with just 1000 lines of code, assuming you're fine with things potentially taking days to run of course.

    Training wouldn't be that much harder, Micrograd[2] is 200LOC of pure Python, 1000 lines would probably be enough for training an (extremely slow) LLM. By "extremely slow", I mean that a training run that normally takes hours could probably take dozens of years, but the results would, in principle, be the same.

    If you were writing in C instead of Python and used something like Llama CPP's optimization tricks, you could probably get somewhat acceptable training performance in 2 or 3 KLOC. You'd still be off by one or two orders of magnitude when compared to a GPU cluster, but a lot better than naive, loopy Python.

    [1] https://github.com/karpathy/llama2.c

    [2] https://github.com/karpathy/micrograd

  • llama

    Inference code for Llama models

  • > Instead, it turns out a few hundred lines of Python is genuinely enough to train a basic version!

    actually its not just a basic version. Llama 1/2's model.py is 500 lines: https://github.com/facebookresearch/llama/blob/main/llama/mo...

    Mistral (is rumored to have) forked llama and is 369 lines: https://github.com/mistralai/mistral-src/blob/main/mistral/m...

    and both of these are SOTA open source models.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • mistral-src

    Reference implementation of Mistral AI 7B v0.1 model.

  • > Instead, it turns out a few hundred lines of Python is genuinely enough to train a basic version!

    actually its not just a basic version. Llama 1/2's model.py is 500 lines: https://github.com/facebookresearch/llama/blob/main/llama/mo...

    Mistral (is rumored to have) forked llama and is 369 lines: https://github.com/mistralai/mistral-src/blob/main/mistral/m...

    and both of these are SOTA open source models.

  • micrograd

    A tiny scalar-valued autograd engine and a neural net library on top of it with PyTorch-like API

  • FOr inference, less than 1KLOC of pure, dependency-free C is enough (if you include the tokenizer and command line parsing)[1]. This was a non-obvious fact for me, in principle, you could run a modern LLM 20 years ago with just 1000 lines of code, assuming you're fine with things potentially taking days to run of course.

    Training wouldn't be that much harder, Micrograd[2] is 200LOC of pure Python, 1000 lines would probably be enough for training an (extremely slow) LLM. By "extremely slow", I mean that a training run that normally takes hours could probably take dozens of years, but the results would, in principle, be the same.

    If you were writing in C instead of Python and used something like Llama CPP's optimization tricks, you could probably get somewhat acceptable training performance in 2 or 3 KLOC. You'd still be off by one or two orders of magnitude when compared to a GPU cluster, but a lot better than naive, loopy Python.

    [1] https://github.com/karpathy/llama2.c

    [2] https://github.com/karpathy/micrograd

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts