Pg_vectorize: The simplest way to do vector search and RAG on Postgres

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

pg_vectorize

5 627 9.1 Rust

The simplest way to orchestrate vector search on Postgres

Sorry if i'm completely missing it, I noticed in the code, there is something around chat:
https://github.com/tembo-io/pg_vectorize/blob/main/src/chat....
This would lead me to believe there is some way to actually invoke not just embeddings, but querying an LLM... which would be crazy powerful. Are there any examples on how to do this?

nlm-ingestor

3 786 7.1 Python

This repo provides the server side code for llmsherpa API to connect. It includes parsers for various file formats.

>tree-based approach to organize and summarize text data, capturing both high-level and low-level details.
https://twitter.com/parthsarthi03/status/1753199233241674040
processes documents, organizing content and improving readability by handling sections, paragraphs, links, tables, lists, page continuations, and removing redundancies, watermarks, and applying OCR, with additional support for HTML and other formats through Apache Tika:
https://github.com/nlmatics/nlm-ingestor

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
pgvector

78 9,211 9.9 C

Open-source vector similarity search for Postgres

There's an issue in the pgvector repo about someone having several ~10-20million row tables and getting acceptable performance with the right hardware and some performance tuning: https://github.com/pgvector/pgvector/issues/455
I'm in the early stages of evaluating pgvector myself. but having used pinecone I currently am liking pgvector better because of it being open source. The indexing algorithm is clear, one can understand and modify the parameters. Furthermore the database is postgresql, not a proprietary document store. When the other data in the problem is stored relationally, it is very convenient to have the vectors stored like this as well. And postgresql has good observability and metrics. I think when it comes to flexibility for specialized applications, pgvector seems like the clear winner. But I can definitely see pinecone's appeal if vector search is not a core component of the problem/business, as it is very easy to use and scales very easily

SemanticSlicer

1 7 7.5 C#

A recursive text chunker that attempts to preserve context.

I wrote a C# library to do this, which is similar to other chunking approaches that are common, like the way langchain does it: https://github.com/drittich/SemanticSlicer
Given a list of separators (regexes), it goes through them in order and keeps splitting the text by them until the chunk fits within the desired size. By putting the higher level separators first (e.g., for HTML split by
before
), it's a pretty good proxy for maintaining context.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project