chatdocs
LLMsPracticalGuide
chatdocs | LLMsPracticalGuide | |
---|---|---|
11 | 11 | |
650 | 8,714 | |
- | - | |
6.9 | 4.5 | |
8 months ago | 29 days ago | |
Python | ||
MIT License | - |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
chatdocs
-
Local LLMs GPUs
https://github.com/marella/chatdocs , this one, right? Takes close to a minute to answer
-
Struggling with Local LLMs
https://github.com/marella/chatdocs , this one.
-
Best commercially viable method to ask questions against a set of 30~ PDFs?
See here: https://github.com/marella/chatdocs#configuration (chatdocs.yml file, context_length)
-
Document digest & oobabooga
What about chatdocs? I asked about it here and the author seems to be open to the idea.
-
What is the best way to create a knowledge-base specific LLM chatbot ?
https://github.com/marella/chatdocs is a fork from privateGPT but with many added features, GPU support, chat UI. There is a reddit thread about it https://www.reddit.com/r/LocalLLaMA/comments/14174f4/chatdocs_privategpt_web_ui_gpu_support_more/
-
Need help finding local LLM
and: https://github.com/marella/chatdocs
-
Chatdocs with mixed language source documents
i played a bit around with chatdocs (https://github.com/marella/chatdocs) but unfortunately are my source documents mixed languages, german and english to be specific. The result heavily depends on the language the questions is asked in, which totally makes sense to me.
-
Creating an Org Knowledge Management System
https://github.com/PromtEngineer/localGPT or https://github.com/marella/chatdocs
- FLaNK Stack Weekly for 12 June 2023
-
You can now chat with your documents privately!
Sounds like a great project and I really like the YouTube tutorials. I haven't been able to get it to work inside WSL. I tried another project here with UI and it works https://github.com/marella/chatdocs
LLMsPracticalGuide
- Ask HN: Daily practices for building AI/ML skills?
-
XGen-7B, a new 7B foundational model trained on up to 8K length for 1.5T tokens
Here are some high level answers:
"7B" refers to the number of parameters or weights for a model. For a specific model, the versions with more parameters take more compute power to train and perform better.
A foundational model is the part of a ML model that is "pretrained" on a massive data set (and usually is the bulk of the compute cost). This is usually considered the "raw" model after which it is fine-tuned for specific tasks (turned into a chatbot).
"8K length" refers to the Context Window length (in tokens). This is basically an LLM's short term memory - you can think of it as its attention span and what it can generate reasonable output for.
"1.5T tokens" refers to the size of the corpus of the training set.
In general Wikipedia (or I suppose ChatGPT 4/Bing Chat with Web Browsing) is a decent enough place to start reading/asking basic questions. I'd recommend starting here: https://en.wikipedia.org/wiki/Large_language_model and finding the related concepts.
For those going deeper, there are lot of general resources lists like https://github.com/Hannibal046/Awesome-LLM or https://github.com/Mooler0410/LLMsPracticalGuide or one I like, https://sebastianraschka.com/blog/2023/llm-reading-list.html (there are a bajillion of these and you'll find more once you get a grasp on the terms you want to surf for). Almost everything is published on arXiv, and most is fairly readable even as a layman.
For non-ML programmers looking to get up to speed, I feel like Karpathy's Zero to Hero/nanoGPT or Jay Mody's picoGPT https://jaykmody.com/blog/gpt-from-scratch/ are alternative/maybe a better way to understand the basic concepts on a practical level.
-
Need help finding local LLM
checked e.g.: - https://medium.com/geekculture/list-of-open-sourced-fine-tuned-large-language-models-llm-8d95a2e0dc76 - https://github.com/Mooler0410/LLMsPracticalGuide - https://www.reddit.com/r/LocalLLaMA/comments/12r552r/creating_an_ai_agent_with_vicuna_7b_and_langchain/ - https://www.youtube.com/watch?v=9ISVjh8mdlA
-
1-Jun-2023
The Practical Guides for Large Language Models (https://github.com/Mooler0410/LLMsPracticalGuide)
- [D] LLM Evolutionare Tree from "The Practical Guides for Large Language Models"
- Comprehensive Table of LLM Usage Restrictions
- Check out this Comprehensive and Practical Guide for Practitioners Working with Large Language Models
- The Practical Guides for Large Language Models
- Practical Guide for LLMs
What are some alternatives?
localGPT - Chat with your documents on your local device using GPT models. No data leaves your device and 100% private.
basaran - Basaran is an open-source alternative to the OpenAI text completion API. It provides a compatible streaming API for your Hugging Face Transformers-based text generation models.
Documize - Modern Confluence alternative designed for internal & external docs, built with Go + EmberJS
Awesome-LLM - Awesome-LLM: a curated list of Large Language Model
Olive - Olive is an easy-to-use hardware-aware model optimization tool that composes industry-leading techniques across model compression, optimization, and compilation.
roop - one-click face swap
text-generation-webui - A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.
lance - Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, with more integrations coming..
open_llama - OpenLLaMA, a permissively licensed open source reproduction of Meta AI’s LLaMA 7B trained on the RedPajama dataset
documenso - The Open Source DocuSign Alternative.
EmbedAI - An app to interact privately with your documents using the power of GPT, 100% privately, no data leaks