[P] Library for end-to-end neural search pipelines

This page summarizes the projects mentioned and recommended in the original post on /r/MachineLearning

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • cherche

    Neural Search

    Github link Documentation Hackernews link

  • faiss

    A library for efficient similarity search and clustering of dense vectors.

    Cherche is compatible with large corpora like wikipedia for example and provides decent response times notebook. I will benchmark Jina and Haystack under the same conditions but there shouldn't be much difference as the responsibility falls on Elasticsearch or on Faiss via retrieve.Encoder. The strength of Cherche is the fancy pipelines composed of union and intersection operations between TfIdf BM25, Flash, multiple rankers... and this is less adapted to Wikipedia. It is more adapted to industrial or personal corpora (<= 100000 documents).

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

  • flashtext

    Extract Keywords from sentence or Replace keywords in sentences.

    I started developing this tool after using haystack. Pipelines are easier to build with cherche because of the operators. Also, cherche offers FlashText, Lunr.py retrievers that are not available in Haystack and that I needed for the project I wanted to solve. Haystack is clearly more complete but I think also more complex to use.

  • lunr.py

    A Python implementation of Lunr.js 🌖

    I started developing this tool after using haystack. Pipelines are easier to build with cherche because of the operators. Also, cherche offers FlashText, Lunr.py retrievers that are not available in Haystack and that I needed for the project I wanted to solve. Haystack is clearly more complete but I think also more complex to use.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts