Manage all types of time series data in a single, purpose-built database. Run at any scale in any environment in the cloud, on-premises, or at the edge. Learn more →
Txtai Alternatives
Similar projects and alternatives to txtai
-
-
sentence-transformers
Multilingual Sentence & Image Embeddings with BERT
-
Mergify
Tired of breaking your main and manually rebasing outdated pull requests?. Managing outdated pull requests is time-consuming. Mergify's Merge Queue automates your pull request management & merging. It's fully integrated to GitHub & coordinated with any CI. Start focusing on code. Try Mergify for free.
-
-
-
-
tika-python
Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.
-
ann-benchmarks
Benchmarks of approximate nearest neighbor libraries in Python
-
InfluxDB
Collect and Analyze Billions of Data Points in Real Time. Manage all types of time series data in a single, purpose-built database. Run at any scale in any environment in the cloud, on-premises, or at the edge.
-
transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
-
qdrant
Qdrant - Vector Database for the next generation of AI applications. Also available in the cloud https://cloud.qdrant.io/
-
-
-
Milvus
A cloud-native vector database, storage for next generation AI applications
-
CLIP
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
-
-
EdgeChains
EdgeChains is a new Language & Grammar for production-friendly Generative AI. Based on Jsonnet & works everywhere (java, python, js,etc). Prompts live declaratively & "outside code". Easy to Reason/Test/Deploy.
-
-
-
annoy
Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk
-
-
-
Sonar
Write Clean Python Code. Always.. Sonar helps you commit clean code every time. With over 225 unique rules to find Python bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work.
txtai reviews and mentions
-
Do we think about vector dbs wrong?
The focus on the top 10 in vector search is a product of wanting to prove value over keyword search. Keyword search is going to miss some conceptual matches. You can try to work around that with tokenization and complex queries with all variations but it's not easy.
Vector search isn't all that new a concept. For example, the annoy library (https://github.com/spotify/annoy) has been around since 2014. It was one of the first open source approximate nearest neighbor libraries. Recommendations have always been a good use case for vector similarity.
Recommendations are a natural extension of search and transformers models made building the vectors for natural language possible. To prove the worth of vector search over keyword search, the focus was always on showing how the top N matches include results not possible with keyword search.
In 2023, there has been a shift towards acknowledging keyword search also has value and that a combination of vector + keyword search (aka hybrid search) operates in the sweet spot. Once again this is validated through the same benchmarks which focus on the top 10.
On top of all this, there is also the reality that the vector database space is very crowded and some want to use their performance benchmarks for marketing.
Disclaimer: I am the author of txtai (https://github.com/neuml/txtai), an open source embeddings database
-
Vector Search with OpenAI Embeddings: Lucene Is All You Need
In terms of "All You Need" for Vector Search, ANN Benchmarks (https://ann-benchmarks.com/) is a good site to review when deciding what you need. As with anything complex, there often isn't a universal solution.
txtai (https://github.com/neuml/txtai) can build indexes with Faiss, Hnswlib and Annoy. All 3 libraries have been around at least 4 years and are mature. txtai also supports storing metadata in SQLite, DuckDB and the next release will support any JSON-capable database supported by SQLAlchemy (Postgres, MariaDB/MySQL, etc).
-
Vector databases: analyzing the trade-offs
Adding txtai to the list: https://github.com/neuml/txtai
txtai is an all-in-one embeddings database for semantic search, LLM orchestration and language model workflows.
Embeddings databases are a union of vector indexes (sparse and dense), graph networks and relational databases. This enables vector search with SQL, topic modeling and retrieval augmented generation.
Another one mentioned below to try is txtai: https://github.com/neuml/txtai
It's an instance that can run in a single Python instance and in production that way.
-
Building an efficient sparse keyword index in Python
Read the full implementation on GitHub to learn more.
-
Do we need a specialized vector database?
There isn't a best universal choice for all situations. If you're already using Postgres and all you want is to add vector search, pgvector might be good enough.
txtai (https://github.com/neuml/txtai) sets out to be an all-in-one embeddings database. This is more than just being a vector database with semantic search. It can embed text into vectors, run LLM workflows, has components for sparse/keyword indexing and graph based search. It also has a relational layer built-in for metadata filtering.
txtai currently supports SQLite/DuckDB for relational data but can be extended. For example, relational data could be stored in Postgres, sparse/dense vectors in Elasticsearch/Opensearch and graph data in Neo4j.
I believe modular solutions like this where internal components can be swapped in and out are the best option but given I'm the author of txtai, I'm a bit biased. This setup enables the scaling and reliability of existing solutions balanced with someone being able get started quickly with a POC to evaluate the use case.
-
txtai 6.0 - the all-in-one embeddings database
GitHub: https://github.com/neuml/txtai
-
SQLite Functions for Working with JSON
The built-in JSON functionality is very powerful. txtai (https://github.com/neuml/txtai) takes full advantage of it and stores all relational data as JSON in SQLite.
The ability to build indexes on these JSON function is important. Found this article to be a good reference: https://www.delphitools.info/2021/06/17/sqlite-as-a-no-sql-d...
-
💡 What's new in txtai 6.0
6.0 Release on GitHub
-
A note from our sponsor - InfluxDB
www.influxdata.com | 22 Sep 2023
Stats
neuml/txtai is an open source project licensed under Apache License 2.0 which is an OSI approved license.
The primary programming language of txtai is Python.