txtai vs pgvector

txtai

💡 All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows (by neuml)

Source Code

neuml.github.io

Suggest alternative

Edit details

pgvector

Open-source vector similarity search for Postgres (by pgvector)

nearest-neighbor-search approximate-nearest-neighbor-search

Source Code

Suggest alternative

Edit details

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

txtai		pgvector
	Project
356	Mentions	78
7,033	Stars	9,349
2.6%	Growth	7.0%
9.3	Activity	9.9
3 days ago	Latest Commit	2 days ago
Python	Language	C
Apache License 2.0	License	GNU General Public License v3.0 or later

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

txtai

Posts with mentions or reviews of txtai. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-05-01.

Show HN: FileKitty – Combine and label text files for LLM prompt contexts
5 projects | news.ycombinator.com | 1 May 2024
What contributing to Open-source is, and what it isn't
1 project | news.ycombinator.com | 27 Apr 2024

I tend to agree with this sentiment. Many junior devs and/or those in college want to contribute. Then they feel entitled to merge a PR that they worked hard on often without guidance. I'm all for working with people but projects have standards and not all ideas make sense. In many cases, especially with commercial open source, the project is the base of a companies identity. So it's not just for drive-by ideas to pad a resume or finish a school project.
For those who do want to do this, I'd recommend writing an issue and/or reaching out to the developers to engage in a dialogue. This takes work but it will increase the likelihood of a PR being merged.
Disclaimer: I'm the primary developer of txtai (https://github.com/neuml/txtai), an open-source vector database + RAG framework
Build knowledge graphs with LLM-driven entity extraction
1 project | dev.to | 21 Feb 2024

txtai is an all-in-one embeddings database for semantic search, LLM orchestration and language model workflows.
Bootstrap or VC?
1 project | news.ycombinator.com | 5 Feb 2024

Bootstrapping only works if you have the runway to do it and you don't feel the need to grow fast.
With NeuML (https://neuml.com), I've went the bootstrapping route. I've been able to build a fairly successful open source project (txtai 6K stars https://github.com/neuml/txtai) and a revenue positive company. It's a "live within your means" strategy.
VC funding can have a snowball effect where you need more and more. Then you're in the loop of needing funding rounds to survive. The hope is someday you're acquired or start turning a profit.
I would say both have their pros and cons. Not all ideas have the luxury of time.
txtai: An embeddings database for semantic search, graph networks and RAG
1 project | news.ycombinator.com | 3 Feb 2024
Ask HN: What happened to startups, why is everything so polished?
2 projects | news.ycombinator.com | 27 Jan 2024

I agree that in many cases people are puffing their feathers to try to be something they're not (at least not yet). Some believe in the fake it until you make it mentality.
With NeuML (https://neuml.com), the website is a simple HTML page. On social media, I'm honest about what NeuML is, that I'm in my 40s with a family and not striving to be the next Steve Jobs. I've been able to build a fairly successful open source project (txtai 6K stars https://github.com/neuml/txtai) and a revenue positive company. For me, authenticity and being genuine is most important. I would say that being genuine has been way more of an asset than liability.
Are we at peak vector database?
8 projects | news.ycombinator.com | 25 Jan 2024

I'll add txtai (https://github.com/neuml/txtai) to the list.
There is still plenty of room for innovation in this space. Just need to focus on the right projects that are innovating and not the ones (re)working on problems solved in 2020/2021.
Txtai: An all-in-one embeddings database for semantic search and LLM workflows
1 project | news.ycombinator.com | 24 Jan 2024
Generate knowledge with Semantic Graphs and RAG
1 project | dev.to | 23 Jan 2024

txtai is an all-in-one embeddings database for semantic search, LLM orchestration and language model workflows.
Show HN: Open-source Rule-based PDF parser for RAG
9 projects | news.ycombinator.com | 23 Jan 2024

Nice project! I've long used Tika for document parsing given it's maturity and wide number of formats supported. The XHTML output helps with chunking documents for RAG.
Here's a couple examples:
- https://neuml.hashnode.dev/build-rag-pipelines-with-txtai
- https://neuml.hashnode.dev/extract-text-from-documents
Disclaimer: I'm the primary author of txtai (https://github.com/neuml/txtai).

pgvector

Posts with mentions or reviews of pgvector. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-04-25.

Integrate txtai with Postgres
2 projects | dev.to | 25 Apr 2024

# Install Postgres and pgvector !apt-get update && apt install postgresql postgresql-server-dev-14 !git clone --branch v0.6.2 https://github.com/pgvector/pgvector.git !cd pgvector && make && make install # Start database !service postgresql start !sudo -u postgres psql -U postgres -c "ALTER USER postgres PASSWORD 'pass';"
Vector Database solutions on AWS
1 project | dev.to | 28 Mar 2024

When talking about Vector Databases, in the market we can find the specialized ones and multi-model, most of the major database providers like Oracle, PostgreSQL or MongoDB, for mention some of them, have integrated a specific solution to retrieve vector data.
Using pgvector To Locate Similarities In Enterprise Data
2 projects | dev.to | 21 Mar 2024

For this example, I wanted to focus on how pgvector – an open-source vector similarity search for Postgres – can be used to identify data similarities that exist in enterprise data.
pgvector vs. pgvecto.rs in 2024: A Comprehensive Comparison for Vector Search in PostgreSQL
1 project | dev.to | 19 Mar 2024

pgvector supports dense vector search well, but it does not have plan to support sparse vector.
Pg_vectorize: The simplest way to do vector search and RAG on Postgres
6 projects | news.ycombinator.com | 6 Mar 2024

There's an issue in the pgvector repo about someone having several ~10-20million row tables and getting acceptable performance with the right hardware and some performance tuning: https://github.com/pgvector/pgvector/issues/455
I'm in the early stages of evaluating pgvector myself. but having used pinecone I currently am liking pgvector better because of it being open source. The indexing algorithm is clear, one can understand and modify the parameters. Furthermore the database is postgresql, not a proprietary document store. When the other data in the problem is stored relationally, it is very convenient to have the vectors stored like this as well. And postgresql has good observability and metrics. I think when it comes to flexibility for specialized applications, pgvector seems like the clear winner. But I can definitely see pinecone's appeal if vector search is not a core component of the problem/business, as it is very easy to use and scales very easily
FLaNK 04 March 2024
26 projects | dev.to | 4 Mar 2024
Vector Database and Spring IA
2 projects | dev.to | 11 Feb 2024

The Spring AI project aims to streamline the development of applications that incorporate artificial intelligence functionality without unnecessary complexity. On this example we use features like: Embedding, Prompts, ETL and save all embedding on PGvector(Postgres Vector database)
Use pgvector for searching images on Azure Cosmos DB for PostgreSQL
2 projects | dev.to | 7 Feb 2024

Official GitHub repository of the pgvector extension
pgvector 0.6.0: 30x faster with parallel index builds
1 project | dev.to | 31 Jan 2024

pgvector 0.6.0 was just released and will be available on Supabase projects soon. Again, a special shout out to Andrew Kane and everyone else who worked on parallel index builds.
Store embeddings in Azure Cosmos DB for PostgreSQL with pgvector
2 projects | dev.to | 29 Jan 2024

The pgvector extension adds vector similarity search capabilities to your PostgreSQL database. To use the extension, you have to first create it in your database. You can install the extension, by connecting to your database and running the CREATE EXTENSION command from the psql command prompt:

What are some alternatives?

When comparing txtai and pgvector you can also consider the following projects:

sentence-transformers - Multilingual Sentence & Image Embeddings with BERT

Milvus - A cloud-native vector database, storage for next generation AI applications

tika-python - Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.

faiss - A library for efficient similarity search and clustering of dense vectors.

transformers - 🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Weaviate - Weaviate is an open-source vector database that stores both objects and vectors, allowing for the combination of vector search with structured filtering with the fault tolerance and scalability of a cloud-native database.

Elasticsearch - Free and Open, Distributed, RESTful Search Engine

CLIP - CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image

qdrant - Qdrant - High-performance, massive-scale Vector Database for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/

paperai - 📄 🤖 Semantic search and workflows for medical/scientific papers

ann-benchmarks - Benchmarks of approximate nearest neighbor libraries in Python

txtai vs sentence-transformers pgvector vs Milvus txtai vs tika-python pgvector vs faiss txtai vs transformers pgvector vs Weaviate txtai vs faiss pgvector vs Elasticsearch txtai vs CLIP pgvector vs qdrant txtai vs paperai pgvector vs ann-benchmarks

Compare txtai vs pgvector and see what are their differences.

txtai

pgvector

txtai

pgvector

What are some alternatives?