llama3
promptfoo
llama3 | promptfoo | |
---|---|---|
21 | 5 | |
21,694 | 328 | |
20.5% | - | |
9.0 | 10.0 | |
8 days ago | 11 months ago | |
Python | TypeScript | |
GNU General Public License v3.0 or later | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
llama3
-
How Meta trains large language models at scale
and deceptive if not inaccurate. Meta's Model Cards specifically call out that they were trained on publicly available datasets and NOT any Meta user data.
For example: https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md
- Reproduce GPT-2 (124M) in llm.c in 90 minutes for $20
-
Hugging Face is sharing $10M worth of compute to help beat the big AI companies
I was curious so I tried to answer this question
---
Training Llama 3 models emitted 2290 tons CO2e (https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md), and took 7.7 million GPU hours. Those GPU hours are for H100s, which consume 700W. So the conversion is approximately 2290 / (7.7e6 * 3600 * 700 / 1e9) ~= 0.12 tons CO2e per GPU-gigajoule.
A100s (what Huggingface offers) consume 400W (https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Cent...) and cost $2.21/hour (in e.g. CoreWeave https://www.coreweave.com/gpu-cloud-pricing). So $10 million in H100s buys you ($10e6 / $2.21/h * 3600s/h) * 400W ~= 6515 Gigajoules in GPU-hours.
So Huggingface's offering will emit ~781 tons CO2e. Less if they've inflated the value of the compute they provide, which they have an incentive to do, but let's round to 800 tons.
---
According to https://www.carbonindependent.org/22.html, one Boeing-737-400 flying for 926km emits (3.61 tons fuel/flight * 3.15(g CO2e / g fuel)) = 11.37 tons CO2e .
So $10million in compute is like ~72 Boeing-737-400 international flights.
-
International Scientific Report on the Safety of Advanced AI [pdf]
> It takes years to become competent at the math needed for AI
(Assuming that "AI" refers to large language models)
The best open source LLM fits in less than 300 lines of code and consists mostly of matrix multiplications. https://github.com/meta-llama/llama3/blob/main/llama/model.p...
Anyone with a basic grasp of linear algebra can probably learn to understand it in a week.
-
Llama3.np: pure NumPy implementation of Llama3
From the readme [0]:
> All models support sequence length up to 8192 tokens, but we pre-allocate the cache according to max_seq_len and max_batch_size values. So set those according to your hardware.
[0] https://github.com/meta-llama/llama3/tree/14aab0428d3ec3a959...
-
Hindi-Language AI Chatbot for Enterprises Using Qdrant, MLFlow, and LangChain
Now, let's start building the next part of the chatbot. In this part, we will be using the LLM from Ollama and integrating it with the chatbot. More particularly, we will be using the Llama-3 model. Llama-3 is Meta's latest and most advanced open-source large language model (LLM). It is the successor to the previous Llama 2 model and represents a significant improvement in performance across a variety of benchmarks and tasks. Llama 3 comes in two main versions - an 8 billion parameter model and a 70 billion parameter model. Llama 3 supports longer context lengths of up to 8,000 tokens.
- FLaNK AI-April 22, 2024
- Meta Llama 3 GitHub
- Mark Zuckerberg himself appears in the list of direct contributors to Llama 3
- Mark Zuckerberg: Llama 3, $10B Models, Caesar Augustus, Bioweapons [video]
promptfoo
-
Ollama v0.1.33 with Llama 3, Phi 3, and Qwen 110B
Jumping in because I'm a big believer in (1) local LLMs, and (2) evals specific to individual use cases.
[0] https://github.com/typpo/promptfoo
- Meta Llama 3
-
Launch HN: Talc AI (YC S23) – Test Sets for AI
Congrats on the launch!
I've been interested in automatic testset generation because I find that the chore of writing tests is one of the reasons people shy away from evals. Recently landed eval testset generation for promptfoo (https://github.com/typpo/promptfoo), but it is non-RAG so more simplistic than your implementation.
Was also eyeballing this paper https://arxiv.org/abs/2401.03038, which outlines a method for generating asserts from prompt version history that may also be useful for these eval tools.
-
GPT-Prompt-Engineer
Thanks for the promptfoo mention. For anyone else who might prefer deterministic, programmatic evaluation of LLM outputs, I've been building promptfoo: https://github.com/typpo/promptfoo
Example asserts include basic string checks, regex, is-json, cosine similarity, etc.
What are some alternatives?
llm - Access large language models from the command-line
rebuff - LLM Prompt Injection Detector
text-generation-inference - Large Language Model Text Generation Inference
gpt-engineer - Specify what you want it to build, the AI asks for clarification, and then builds it.
DeepSeek-Coder - DeepSeek Coder: Let the Code Write Itself
ChainForge - An open-source visual programming environment for battle-testing prompts to LLMs.
llama - Inference code for Llama models
plandex - AI driven development in your terminal. Designed for large, real-world tasks.
incubator-xtable - Apache XTable (incubating) is a cross-table converter for lakehouse table formats that facilitates interoperability across data processing systems and query engines.
shap-e - Generate 3D objects conditioned on text or images
FLiPStackWeekly - FLaNK AI Weekly covering Apache NiFi, Apache Flink, Apache Kafka, Apache Spark, Apache Iceberg, Apache Ozone, Apache Pulsar, and more...
gateway - A Blazing Fast AI Gateway. Route to 200+ LLMs with 1 fast & friendly API.