Build Personal ChatGPT Using Your Data

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

private-gpt

131 51,732 9.2 Python

Interact with your documents using the power of GPT, 100% privately, no data leaks

When running a Mac with Intel hardware (not M1), you may run into clang: error: the clang compiler does not support '-march=native' during pip install.
If so set your archflags during pip install. eg: ARCHFLAGS="-arch x86_64" pip3 install -r requirements.txt
https://github.com/imartinez/privateGPT#mac-running-intel

PdfGptIndexer

4 637 4.8 Python

An efficient tool for indexing and searching PDF text data using OpenAI API and FAISS (Facebook AI Similarity Search) index, designed for rapid information retrieval and superior search accuracy.
WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
gpt4all

139 64,046 9.8 C++

gpt4all: run open-source LLMs anywhere

I assume this is the link: https://github.com/nomic-ai/gpt4all ?

gpt-2

63 21,111 2.5 Python

Code for the paper "Language Models are Unsupervised Multitask Learners"
txtai

354 6,953 9.3 Python

💡 All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows
paperai

19 1,194 5.9 Python

📄 🤖 Semantic search and workflows for medical/scientific papers

https://github.com/neuml/paperai
Disclaimer: I am the author of both

instructor-embedding

4 1,695 6.1 Python

[ACL 2023] One Embedder, Any Task: Instruction-Finetuned Text Embeddings

If you look at a embeddings leaderboard [1], one of the top competitors called InstructorXL [2] is just a pip install away. It's neck and neck with Ada v2 except for a shorter input length and half the dimensions, with the added benefit that you'll always have the model available.
Most of the other options just work with the transformers library.
[1] https://huggingface.co/spaces/mteb/leaderboard
[2] https://github.com/HKUNLP/instructor-embedding

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
text-generation-webui

876 35,862 9.9 Python

A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.
OpenLLM

25 8,733 9.9 Python

Run any open-source LLMs, such as Llama 2, Mistral, as OpenAI compatible API endpoint, locally and in the cloud.
vlite

7 686 6.2 Python

fast vector database made in numpy

I am working on a simple vector db just with numpy: https://github.com/sdan/vlite
I think milvus, quickwit, and pinecone are geared more towards enterprise and are hard to use.

easydiffusion

16 9,088 9.4 JavaScript

Easiest 1-click way to create beautiful artwork on your PC using AI, with no tech knowledge. Provides a browser UI for generating images from text prompts and images. Just enter your text prompt, and see the generated image.

Easiest 1-click way to install and use Stable Diffusion on your computer."
https://github.com/easydiffusion/easydiffusion
And while Whisper is OpenAI, it is trivial to use locally and extremely usefull
https://github.com/chidiwilliams/buzz

buzz

21 9,869 8.5 Python

Buzz transcribes and translates audio offline on your personal computer. Powered by OpenAI's Whisper.

Easiest 1-click way to install and use Stable Diffusion on your computer."
https://github.com/easydiffusion/easydiffusion
And while Whisper is OpenAI, it is trivial to use locally and extremely usefull
https://github.com/chidiwilliams/buzz

openai-cookbook

214 55,805 9.5 MDX

Examples and guides for using the OpenAI API

Please provide this reference in your readme / blog as it is the original source for your work... and provides the background for the tradeoff between fine-tuning vs ask-search.
https://github.com/openai/openai-cookbook/blob/main/examples...

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Build knowledge graphs with LLM-driven entity extraction
1 project | dev.to | 21 Feb 2024
Bootstrap or VC?
1 project | news.ycombinator.com | 5 Feb 2024
txtai: An embeddings database for semantic search, graph networks and RAG
1 project | news.ycombinator.com | 3 Feb 2024
Txtai: An all-in-one embeddings database for semantic search and LLM workflows
1 project | news.ycombinator.com | 24 Jan 2024
Generate knowledge with Semantic Graphs and RAG
1 project | dev.to | 23 Jan 2024

Build Personal ChatGPT Using Your Data

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Python Search Machine Learning NLP document-search
Post date: 8 Jul 2023

private-gpt

PdfGptIndexer

WorkOS

gpt4all

gpt-2

txtai

paperai

instructor-embedding

InfluxDB

text-generation-webui

OpenLLM

vlite

easydiffusion

buzz

openai-cookbook

Related posts

Build Personal ChatGPT Using Your Data

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com Python Search Machine Learning NLP document-search Post date: 8 Jul 2023

Related posts

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Python Search Machine Learning NLP document-search
Post date: 8 Jul 2023