Ask HN: Is it feasible to train my own LLM?

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

simonwillisonblog

28 159 8.2 JavaScript

The source code behind my blog
text-generation-webui

876 36,293 9.9 Python

A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.

https://github.com/oobabooga/text-generation-webui/blob/main...
Consider a finetune - they're faster and relatively cheap (like, under $30 rented compute time). The link above lists them, but the steps are to gather a dataset, do the training, and evaluate your results. LLMs are about instruction/evaluation, so it's easy to show results, measure perplexity, and compare against the base model.
If you're interested in a building a limited dataset, fun ideas might be quotes or conversations from your classmates, lessons or syllabi from your program, or other specific, local, testable information. Datasets aren't plug and play, and they're the most important part of a model.
However, even using the same dataset can yield different results based on training parameters. I'd keep it simple and either make the test about the impact of differences in training parameters using a single dataset, or pick two already created datasets and train using the same parameters for comparison.
Good luck in IB! I was in it until I moved cities, and it was a blast.

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
nanoGPT

69 31,713 5.4 Python

The simplest, fastest repository for training/finetuning medium-sized GPTs.

For training from scratch, maybe a small model like https://github.com/karpathy/nanoGPT or tinyllama. Perhaps with quantization.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Sandboxing Python with Win32 App Isolation
1 project | news.ycombinator.com | 14 Mar 2024
AI for Web Devs: Addressing Bugs, Security, & Reliability
1 project | dev.to | 31 Jan 2024
Where Have All the Websites Gone?
3 projects | news.ycombinator.com | 9 Jan 2024
Moving Away from Substack
1 project | news.ycombinator.com | 16 Nov 2023
Show HN: Superfunctions – AI prompt templates as an API
1 project | news.ycombinator.com | 20 Aug 2023

Ask HN: Is it feasible to train my own LLM?

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Django Python Blogging
Post date: 2 Jan 2024

simonwillisonblog

text-generation-webui

InfluxDB

nanoGPT

Related posts

Ask HN: Is it feasible to train my own LLM?

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com Django Python Blogging Post date: 2 Jan 2024

simonwillisonblog

text-generation-webui

InfluxDB

nanoGPT

Related posts

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Django Python Blogging
Post date: 2 Jan 2024