Gemma: New Open Models

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • gemma_pytorch

    The official PyTorch implementation of Google's Gemma models

  • https://github.com/google/gemma_pytorch/blob/main/tokenizer/...

    I decoded this model protobuf in Python and here is the diff with the Llama 2 tokenizer:

  • ollama

    Get up and running with Llama 3, Mistral, Gemma, and other large language models.

  • Already available in Ollama v0.1.26 preview release, if you'd like to start playing with it locally:

    - https://github.com/ollama/ollama/releases/tag/v0.1.26

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • gemma.cpp

    lightweight, standalone C++ inference engine for Google's Gemma models.

  • They have implemented the model also on their own C++ inference engine: https://github.com/google/gemma.cpp

  • gemma

    Open weights LLM from Google DeepMind. (by google-deepmind)

  • We've documented the architecture (including key differences) in our technical report here (https://goo.gle/GemmaReport), and you can see the architecture implementation in our Git Repo (https://github.com/google-deepmind/gemma).

  • llama.cpp

    LLM inference in C/C++

  • It should be possible to run it via llama.cpp[0] now.

    [0] https://github.com/ggerganov/llama.cpp/pull/5631

  • text-to-text-transfer-transformer

    Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"

  • Google released the T5 paper about 5 years ago:

    https://arxiv.org/abs/1910.10683

    This included full model weights along with a detailed description of the dataset, training process, and ablations that led them to that architecture. T5 was state-of-the-art on many benchmarks when it was released, but it was of course quickly eclipsed by GPT-3.

    Following GPT-3, it became much more common for labs to not release full details or model weights. Prior to that, it was common practice from Google (BERT, T5), Meta (BART), OpenAI (GPT1, GPT2) and others to release full training details and model weights.

  • ai-on-gke

  • There is a lot of work to make the actual infrastructure and lower level management of lots and lots of GPUs/TPUs open as well - my team focuses on making the infrastructure bit at least a bit more approachable on GKE and Kubernetes.

    https://github.com/GoogleCloudPlatform/ai-on-gke/tree/main

    and

    https://github.com/google/xpk (a bit more focused on HPC, but includes AI)

    and

    https://github.com/stas00/ml-engineering (not associated with GKE, but describes training with SLURM)

    The actual training is still a bit of a small pool of very experienced people, but it's getting better. And every day serving models gets that much faster - you can often simply draft on Triton and TensorRT-LLM or vLLM and see significant wins month to month.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • xpk

    xpk (Accelerated Processing Kit, pronounced x-p-k,) is a software tool to help Cloud developers to orchestrate training jobs on accelerators such as TPUs and GPUs on GKE.

  • There is a lot of work to make the actual infrastructure and lower level management of lots and lots of GPUs/TPUs open as well - my team focuses on making the infrastructure bit at least a bit more approachable on GKE and Kubernetes.

    https://github.com/GoogleCloudPlatform/ai-on-gke/tree/main

    and

    https://github.com/google/xpk (a bit more focused on HPC, but includes AI)

    and

    https://github.com/stas00/ml-engineering (not associated with GKE, but describes training with SLURM)

    The actual training is still a bit of a small pool of very experienced people, but it's getting better. And every day serving models gets that much faster - you can often simply draft on Triton and TensorRT-LLM or vLLM and see significant wins month to month.

  • ml-engineering

    Machine Learning Engineering Open Book

  • There is a lot of work to make the actual infrastructure and lower level management of lots and lots of GPUs/TPUs open as well - my team focuses on making the infrastructure bit at least a bit more approachable on GKE and Kubernetes.

    https://github.com/GoogleCloudPlatform/ai-on-gke/tree/main

    and

    https://github.com/google/xpk (a bit more focused on HPC, but includes AI)

    and

    https://github.com/stas00/ml-engineering (not associated with GKE, but describes training with SLURM)

    The actual training is still a bit of a small pool of very experienced people, but it's getting better. And every day serving models gets that much faster - you can often simply draft on Triton and TensorRT-LLM or vLLM and see significant wins month to month.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts