Meta AI releases Code Llama 70B

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • codellama

    Inference code for CodeLlama models

  • The github [0] hasn't been fully updated, but it links to a paper [1] that describes how the smaller code llama models were trained. It would be a good guess that this model is similar.

    [0] https://github.com/facebookresearch/codellama

  • can-ai-code

    Self-evaluating interview for AI coders

  • This is a completely fair, but open question. Not to be a typical HN user, but when you say SOTA local, the question is really what benchmarks do you really care about in order to evaluate. Size, operability, complexity, explainability etc.

    Working out what copilot models perform best has been a deep exercise for myself and has really made me evaluate my own coding style on what I find important and things I look out for when investigating models and evaluating interview candidates.

    I think three benchmarks & leaderboards most go to are:

    https://huggingface.co/spaces/bigcode/bigcode-models-leaderb... - which is the most understood, broad language capability leaderboad that relies on well understood evaluations and benchmarks.

    https://huggingface.co/spaces/mike-ravkine/can-ai-code-resul... - Also comprehensive, but primarily assesses Python and JavaScript.

    https://evalplus.github.io/leaderboard.html - which I think is a better take on comparing models you intend to run locally as you can evaluate performance, operability and size in one visualisation.

    Best of luck and I would love to know which models & benchmarks you choose and why.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • continue

    ⏩ Open-source VS Code and JetBrains extensions that enable you to easily create your own modular AI software development system

  • Continue doesn’t support tab completion like Copilot yet.

    A pull/merge request is being worked on: https://github.com/continuedev/continue/pull/758

  • text-generation-webui

    A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.

  • You can download it and run it with [this](https://github.com/oobabooga/text-generation-webui). There's an API mode that you could leverage from your VS Code extension.

  • llama.cpp

    LLM inference in C/C++

  • M3 Max is actually less than ideal because it peaks at 400 Gb/s for memory. What you really want is M1 or M2 Ultra, which offers up to 800 Gb/s (for comparison, RTX 3090 runs at 936 GB/s). A Mac Studio suitable for running 70B models with speeds fast enough for realtime chat can be had for ~$3K

    The downside of Apple's hardware at the moment is that the training ecosystem is very much focused on CUDA; llama.cpp has an open issue about Metal-accelerated training: https://github.com/ggerganov/llama.cpp/issues/3799 - but no work on it so far. This is likely because training at any significant sizes requires enough juice that it's pretty much always better to do it in the cloud currently, where, again, CUDA is the well-established ecosystem, and it's cheaper and easier for datacenter operators to scale. But, in principle, much faster training on Apple hardware should be possible, and eventually someone will get it done.

  • llama

    Inference code for Llama models

  • https://github.com/facebookresearch/llama/pull/947/

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts