Llama2.c L2E LLM – Multi OS Binary and Unikernel Release

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • llama2.c

    Llama 2 Everywhere (L2E) (by trholding)

  • It's just the beginning, still optimization and some figuring out to do.

    This fork is based on karpathy's llama.c and we try to mirror its progress and add our patches to it to add performance, be binary portable or run as unikernel. However there is a catch, this doesn't currently infer the 7b or bigger meta llama 2 models yet. It's too slow and memory consuming.

    My plan is to get to a stage where we can actually infer larger models at a comfortable speed like llama.cpp / ggml does, add GPU acceleration along the way.

    Also this doesn't have a web api, I'll be adding that in the next update, then it would actually make sense to deploy it on a server to test it out.

    Right now you would have to manually spawn vm instances with qemu like this:

    qemu-system-x86_64 -m 256m -accel kvm -kernel L2E_qemu-x86_64

    or

    qemu-system-x86_64 -m 256m -accel kvm -kernel L2E_qemu-x86_64 -nographic

    and that's not very practical especially as there is no web api yet. So see this more as a tech preview - release early release often thing.

    Yeah as I'll get time, I'll be adding a build for firecracker and also write instructions to spawn 100's of baby llama 2 kvm qemu / firecracker builds on a powerful server.

    Thank you for your interest. As per your suggestion, a comprehensive howto is planned. Feel free to add any issue / wants / suggestions to https://github.com/trholding/llama2.c , I'll address those as I get time.

    I'm stuck with bigger IRL projects, but if there is deep interest from the community I'll be sure to spend more time on this.

  • llama2.c

    Inference Llama 2 in one file of pure C

  • This is a fork of https://github.com/karpathy/llama2.c

    karpathy's llama2.c is like llama.cpp but it is written in C and the python training code is available in that same repo. llama2.c's goal is to be a elegant single file C implementation of the inference and an elegant python implementation for training.

    His goal is for people to understand how llama 2 and LLM's work, so he keeps it simple and sweet. As the project progresses, so will features and performance improvements added.

    Currently it can infer baby (small) Story models trained by Karpathy at a fast pace. It can also infer Meta LLAMA 2 7b models, but at a very slow rate such as 1 token per second.

    So currently this can be used for learning or as a tech preview.

    Our friendly fork tries to make it portable, performant and more usable (bells and whistles) over time. Since we mirror upstream closely, the inference capabilities of our fork is similar but slightly faster if compiled with acceleration. What we try to do different is that we try to make this bootable (not there yet) and portable. Right now you can get binary portablity - use the same run.com on any x86_64 machine running on any OS, it will work (possible due to cosmopolitan toolchain). The other part that works is unikernels - boot this as unikernel in VM's (possible due unikraft unikernel & toolchain).

    See our fork currently as a release early and release often toy tech demo. We plan to build it out into a useful product.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • The Second Batch of the Open Source AI Grants

    1 project | news.ycombinator.com | 15 Dec 2023
  • Llama 2 Everywhere (L2E): Standalone, Binary Portable, Bootable Llama 2

    1 project | /r/Boiling_Steam | 9 Oct 2023
  • Llama 2 Everywhere (L2E): Standalone, Binary Portable, Bootable Llama 2

    1 project | /r/hackernews | 8 Oct 2023
  • Llama 2 Everywhere (L2E): Standalone, Binary Portable, Bootable Llama 2

    1 project | news.ycombinator.com | 5 Oct 2023
  • Play a hidden framebuffer Doom on TempleDOS

    1 project | news.ycombinator.com | 3 Oct 2023