Ask HN: Is it feasible to train my own LLM?

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • simonwillisonblog

    The source code behind my blog

  • text-generation-webui

    A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.

  • https://github.com/oobabooga/text-generation-webui/blob/main...

    Consider a finetune - they're faster and relatively cheap (like, under $30 rented compute time). The link above lists them, but the steps are to gather a dataset, do the training, and evaluate your results. LLMs are about instruction/evaluation, so it's easy to show results, measure perplexity, and compare against the base model.

    If you're interested in a building a limited dataset, fun ideas might be quotes or conversations from your classmates, lessons or syllabi from your program, or other specific, local, testable information. Datasets aren't plug and play, and they're the most important part of a model.

    However, even using the same dataset can yield different results based on training parameters. I'd keep it simple and either make the test about the impact of differences in training parameters using a single dataset, or pick two already created datasets and train using the same parameters for comparison.

    Good luck in IB! I was in it until I moved cities, and it was a blast.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • nanoGPT

    The simplest, fastest repository for training/finetuning medium-sized GPTs.

  • For training from scratch, maybe a small model like https://github.com/karpathy/nanoGPT or tinyllama. Perhaps with quantization.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts