Mixtral of Experts

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

t5x

7 2,503 8.5 Python

> Are you using a normal training script i.e. "continued pretraining" on ALL parameters with just document fragments rather than input output pairs?
Yes, this one.
> do you make a custom dataset that has qa pairs about that particular knowledgebase?
This one. Once you have a checkpoint w knowledge, it makes sense to finetune. You can use either LORA or PEFT. We do it depending on the case. (some orgs have like millions of tokens and i am not that confident that PEFT).
LoRA with raw document text may not work, haven't tried that. Google has a good example of training scripts here: https://github.com/google-research/t5x (under training. and then finetuning). I like this one. Facebook Research also has a few on their repo.
If you are just looking to scrape by, I would suggest just do what they tell you to do. You can offer suggestions, but better let them take the call. A lot of fluff, a lot of chatter online, so everyone is figuring out stuff.
One note about pretraining is that it is costly, so most OSS devs just do direct finetuning/LoRA. Works because their dataset is from the open internet. Orgs aren't finding much value with these. And yet, many communities are filled with these tactics.

llama.cpp

775 57,463 10.0 C++

LLM inference in C/C++

https://github.com/ggerganov/llama.cpp/pull/4406
The GGUF handling for Mistral's mixture of experts hasn't been finalized yet. TheBloke and ggerganov and friends are still figuring out what works best.
The Q5_K_M gguf model is about 32GB. That's not going to fit into any consumer grade GPU, but it should be possible to run on a reasonably powerful workstation or gaming rig. Maybe not fast enough to be useful for everyday productivity, but it should run well enough to get a sense of what's possible. Sort of a glimpse into the future.

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
nn-2003

1 17 10.0 C

2003 Neural Networks experiments -- when it was not mainstream ;-)

I agree with you, stavros. There is no transfer between C coding and ML topics. However the original question is a bit more in the business side IMHO. Anyway: I've some experience with machine learning: 20 years ago I wrote (my first neural network)[https://github.com/antirez/nn-2003] and since then I always stayed in the loop. Not for work, as I specialized in system programming, but for personal research I played with NN images compression, NLP tasks and convnets. In more recent times I use pytorch for my stuff, LLM fine-tuning and I'm a "local LLMs" enthusiast. I speculated a lot about AI, and wrote a novel about this topic. So while the question was more in the business side, I have some competence in the general field of ML.

vllm

31 18,931 9.9 Python

A high-throughput and memory-efficient inference and serving engine for LLMs

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

AI leaderboards are no longer useful. It's time to switch to Pareto curves

1 project | news.ycombinator.com | 30 Apr 2024
VLLM Sacrifices Accuracy for Speed

1 project | news.ycombinator.com | 23 Jan 2024
Easy, fast, and cheap LLM serving for everyone

1 project | news.ycombinator.com | 17 Dec 2023
vllm

1 project | news.ycombinator.com | 15 Dec 2023
Mixtral Expert Parallelism

1 project | news.ycombinator.com | 15 Dec 2023

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Gpt llm Pytorch llmops Mlops
Post date: 11 Dec 2023

t5x

llama.cpp

InfluxDB

nn-2003

vllm

Related posts

AI leaderboards are no longer useful. It's time to switch to Pareto curves

VLLM Sacrifices Accuracy for Speed

Easy, fast, and cheap LLM serving for everyone

vllm

Mixtral Expert Parallelism

Mixtral of Experts

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com Gpt llm Pytorch llmops Mlops Post date: 11 Dec 2023

t5x

llama.cpp

InfluxDB

nn-2003

vllm

Related posts

AI leaderboards are no longer useful. It's time to switch to Pareto curves

VLLM Sacrifices Accuracy for Speed

Easy, fast, and cheap LLM serving for everyone

vllm

Mixtral Expert Parallelism

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Gpt llm Pytorch llmops Mlops
Post date: 11 Dec 2023