Gemma: New Open Models

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

gemma_pytorch

6 5,024 7.6 Python

The official PyTorch implementation of Google's Gemma models

https://github.com/google/gemma_pytorch/blob/main/tokenizer/...
I decoded this model protobuf in Python and here is the diff with the Llama 2 tokenizer:

ollama

192 58,943 9.9 Go

Get up and running with Llama 3, Mistral, Gemma, and other large language models.

Already available in Ollama v0.1.26 preview release, if you'd like to start playing with it locally:
- https://github.com/ollama/ollama/releases/tag/v0.1.26

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
gemma.cpp

8 5,476 9.0 C++

lightweight, standalone C++ inference engine for Google's Gemma models.

They have implemented the model also on their own C++ inference engine: https://github.com/google/gemma.cpp

gemma

2 2,028 5.8 Jupyter Notebook

Open weights LLM from Google DeepMind. (by google-deepmind)

We've documented the architecture (including key differences) in our technical report here (https://goo.gle/GemmaReport), and you can see the architecture implementation in our Git Repo (https://github.com/google-deepmind/gemma).

llama.cpp

769 56,891 10.0 C++

LLM inference in C/C++

It should be possible to run it via llama.cpp[0] now.
[0] https://github.com/ggerganov/llama.cpp/pull/5631

text-to-text-transfer-transformer

29 5,899 5.0 Python

Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"

Google released the T5 paper about 5 years ago:
https://arxiv.org/abs/1910.10683
This included full model weights along with a detailed description of the dataset, training process, and ablations that led them to that architecture. T5 was state-of-the-art on many benchmarks when it was released, but it was of course quickly eclipsed by GPT-3.
Following GPT-3, it became much more common for labs to not release full details or model weights. Prior to that, it was common practice from Google (BERT, T5), Meta (BART), OpenAI (GPT1, GPT2) and others to release full training details and model weights.

ai-on-gke

2 141 9.8 Jupyter Notebook

There is a lot of work to make the actual infrastructure and lower level management of lots and lots of GPUs/TPUs open as well - my team focuses on making the infrastructure bit at least a bit more approachable on GKE and Kubernetes.
https://github.com/GoogleCloudPlatform/ai-on-gke/tree/main
and
https://github.com/google/xpk (a bit more focused on HPC, but includes AI)
and
https://github.com/stas00/ml-engineering (not associated with GKE, but describes training with SLURM)
The actual training is still a bit of a small pool of very experienced people, but it's getting better. And every day serving models gets that much faster - you can often simply draft on Triton and TensorRT-LLM or vLLM and see significant wins month to month.

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
xpk

1 52 8.6 Python

xpk (Accelerated Processing Kit, pronounced x-p-k,) is a software tool to help Cloud developers to orchestrate training jobs on accelerators such as TPUs and GPUs on GKE.

There is a lot of work to make the actual infrastructure and lower level management of lots and lots of GPUs/TPUs open as well - my team focuses on making the infrastructure bit at least a bit more approachable on GKE and Kubernetes.
https://github.com/GoogleCloudPlatform/ai-on-gke/tree/main
and
https://github.com/google/xpk (a bit more focused on HPC, but includes AI)
and
https://github.com/stas00/ml-engineering (not associated with GKE, but describes training with SLURM)
The actual training is still a bit of a small pool of very experienced people, but it's getting better. And every day serving models gets that much faster - you can often simply draft on Triton and TensorRT-LLM or vLLM and see significant wins month to month.

ml-engineering

9 9,753 9.7 Python

Machine Learning Engineering Open Book

There is a lot of work to make the actual infrastructure and lower level management of lots and lots of GPUs/TPUs open as well - my team focuses on making the infrastructure bit at least a bit more approachable on GKE and Kubernetes.
https://github.com/GoogleCloudPlatform/ai-on-gke/tree/main
and
https://github.com/google/xpk (a bit more focused on HPC, but includes AI)
and
https://github.com/stas00/ml-engineering (not associated with GKE, but describes training with SLURM)
The actual training is still a bit of a small pool of very experienced people, but it's getting better. And every day serving models gets that much faster - you can often simply draft on Triton and TensorRT-LLM or vLLM and see significant wins month to month.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project