💡 All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows (by neuml)

Txtai Alternatives

Similar projects and alternatives to txtai

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a better txtai alternative or higher similarity.

txtai reviews and mentions

Posts with mentions or reviews of txtai. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-09-05.
  • Do we think about vector dbs wrong?
    7 projects | news.ycombinator.com | 5 Sep 2023
    The focus on the top 10 in vector search is a product of wanting to prove value over keyword search. Keyword search is going to miss some conceptual matches. You can try to work around that with tokenization and complex queries with all variations but it's not easy.

    Vector search isn't all that new a concept. For example, the annoy library (https://github.com/spotify/annoy) has been around since 2014. It was one of the first open source approximate nearest neighbor libraries. Recommendations have always been a good use case for vector similarity.

    Recommendations are a natural extension of search and transformers models made building the vectors for natural language possible. To prove the worth of vector search over keyword search, the focus was always on showing how the top N matches include results not possible with keyword search.

    In 2023, there has been a shift towards acknowledging keyword search also has value and that a combination of vector + keyword search (aka hybrid search) operates in the sweet spot. Once again this is validated through the same benchmarks which focus on the top 10.

    On top of all this, there is also the reality that the vector database space is very crowded and some want to use their performance benchmarks for marketing.

    Disclaimer: I am the author of txtai (https://github.com/neuml/txtai), an open source embeddings database

  • Vector Search with OpenAI Embeddings: Lucene Is All You Need
    2 projects | news.ycombinator.com | 3 Sep 2023
    In terms of "All You Need" for Vector Search, ANN Benchmarks (https://ann-benchmarks.com/) is a good site to review when deciding what you need. As with anything complex, there often isn't a universal solution.

    txtai (https://github.com/neuml/txtai) can build indexes with Faiss, Hnswlib and Annoy. All 3 libraries have been around at least 4 years and are mature. txtai also supports storing metadata in SQLite, DuckDB and the next release will support any JSON-capable database supported by SQLAlchemy (Postgres, MariaDB/MySQL, etc).

  • Vector databases: analyzing the trade-offs
    5 projects | news.ycombinator.com | 20 Aug 2023
    Adding txtai to the list: https://github.com/neuml/txtai

    txtai is an all-in-one embeddings database for semantic search, LLM orchestration and language model workflows.

    Embeddings databases are a union of vector indexes (sparse and dense), graph networks and relational databases. This enables vector search with SQL, topic modeling and retrieval augmented generation.

    5 projects | news.ycombinator.com | 20 Aug 2023
    Another one mentioned below to try is txtai: https://github.com/neuml/txtai

    It's an instance that can run in a single Python instance and in production that way.

  • Building an efficient sparse keyword index in Python
    5 projects | dev.to | 17 Aug 2023
    Read the full implementation on GitHub to learn more.
  • Do we need a specialized vector database?
    2 projects | news.ycombinator.com | 12 Aug 2023
    There isn't a best universal choice for all situations. If you're already using Postgres and all you want is to add vector search, pgvector might be good enough.

    txtai (https://github.com/neuml/txtai) sets out to be an all-in-one embeddings database. This is more than just being a vector database with semantic search. It can embed text into vectors, run LLM workflows, has components for sparse/keyword indexing and graph based search. It also has a relational layer built-in for metadata filtering.

    txtai currently supports SQLite/DuckDB for relational data but can be extended. For example, relational data could be stored in Postgres, sparse/dense vectors in Elasticsearch/Opensearch and graph data in Neo4j.

    I believe modular solutions like this where internal components can be swapped in and out are the best option but given I'm the author of txtai, I'm a bit biased. This setup enables the scaling and reliability of existing solutions balanced with someone being able get started quickly with a POC to evaluate the use case.

  • txtai 6.0 - the all-in-one embeddings database
    2 projects | /r/LanguageTechnology | 12 Aug 2023
    GitHub: https://github.com/neuml/txtai
    2 projects | /r/LanguageTechnology | 12 Aug 2023
  • SQLite Functions for Working with JSON
    10 projects | news.ycombinator.com | 10 Aug 2023
    The built-in JSON functionality is very powerful. txtai (https://github.com/neuml/txtai) takes full advantage of it and stores all relational data as JSON in SQLite.

    The ability to build indexes on these JSON function is important. Found this article to be a good reference: https://www.delphitools.info/2021/06/17/sqlite-as-a-no-sql-d...

  • 💡 What's new in txtai 6.0
    2 projects | dev.to | 10 Aug 2023
    6.0 Release on GitHub
  • A note from our sponsor - InfluxDB
    www.influxdata.com | 22 Sep 2023
    Manage all types of time series data in a single, purpose-built database. Run at any scale in any environment in the cloud, on-premises, or at the edge. Learn more →


Basic txtai repo stats
about 12 hours ago
Write Clean Python Code. Always.
Sonar helps you commit clean code every time. With over 225 unique rules to find Python bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work.