Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →
C++ language-model Projects
-
nnl
a low-latency and high-performance inference engine for large models on low-memory GPU platform.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
There's https://github.com/saharNooby/rwkv.cpp, which related-ish[0] to ggml/llama.cpp
[0]: https://github.com/ggerganov/llama.cpp/issues/846
Project mention: Run 70B LLM Inference on a Single 4GB GPU with This New Technique | news.ycombinator.com | 2023-12-03I did roughly the same thing in one of my hobby project https://github.com/fengwang/nnl. But in stead of using SSD, I load all the weights to the host memory, and while inferencing the model layer by layer, I asynchronously copy memory from global to shared memory in the hope of better performance. However, my approach is bounded by the PCI-E bandwidth.
C++ language-model related posts
-
Haystack DB – 10x faster than FAISS with binary embeddings by default
-
WyGPT: Minimal mature GPT model in C++
-
[D] SentencePiece, WordPiece, BPE... Which tokenizer is the best one?
-
[P]wyGPT: improved small GPT model in C++ from scratch
-
wangyi-fudan/wyGPT
-
[P]wyGPT: Improved Small GPT In C++ From Scratch
-
WyGPT: C++ GPT Language Model from Scratch
-
A note from our sponsor - InfluxDB
www.influxdata.com | 2 May 2024
Index
Sponsored