Using LLaMA with M1 Mac and Python 3.11

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • text-generation-webui

    A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.

  • My Ubuntu desktop has 64 gigs RAM, with a 12G RTX 3060 card. I have 4 bit 13B parameter LLaMA running on it currently, following these instructions - https://github.com/oobabooga/text-generation-webui/wiki/LLaM... . They don't have 30B or 65B ready yet.

    Might try other methods to do 30B, or switch to my M1 Macbook if that's useful (as it said here). Don't have an immediate need for it, just futzing with it currently.

  • llama-dl

    Discontinued High-speed download of LLaMA, Facebook's 65B parameter GPT model [UnavailableForLegalReasons - Repository access blocked]

  • Sure. You can get models with magnet link from here https://github.com/shawwn/llama-dl/

    To get running, just follow these steps https://github.com/ggerganov/llama.cpp/#usage

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • dalai

    The simplest way to run LLaMA on your local machine

  • I'm pretty sure there's a mistake here: https://github.com/cocktailpeanut/dalai/blob/main/index.js#L... , there's a ${suffix} missing

    It causes the quantization to process to always use the first part of the model if using a larger size than 7B. I don't even know what this stuff does, but I see the ggml-model-f16.bin files have ggml-model-f16.bin.X as well in the folder, so I'm pretty sure this is a mistake

  • llama.cpp

    LLM inference in C/C++

  • See https://github.com/ggerganov/llama.cpp/issues/62 (the related repo was originally posted on 4chan, is all, but the code is on GitHub)

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • More Agents Is All You Need: LLMs performance scales with the number of agents

    2 projects | news.ycombinator.com | 6 Apr 2024
  • Show HN: macOS GUI for running LLMs locally

    1 project | news.ycombinator.com | 18 Sep 2023
  • Ask HN: What are the capabilities of consumer grade hardware to work with LLMs?

    1 project | news.ycombinator.com | 3 Aug 2023
  • Meta to release open-source commercial AI model

    3 projects | news.ycombinator.com | 14 Jul 2023
  • How can I run a large language model locally?

    1 project | /r/learnprogramming | 11 Jul 2023