Python information-retrieval

Open-source Python projects categorized as information-retrieval

Top 21 Python information-retrieval Projects

  • EasyOCR

    Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.

    Project mention: I made a website for a friend who owns a restaurant. He's wondering if there's a way to upload a picture of his menu daily. What is the best way to do this? | reddit.com/r/learnprogramming | 2023-01-15
  • gensim

    Topic Modelling for Humans

    Project mention: Understanding How Dynamic node2vec Works on Streaming Data | dev.to | 2022-12-23

    This is our optimization problem. Now, we hope that you have an idea of what our goal is. Luckily for us, this is already implemented in a Python module called gensim. Yes, these guys are brilliant in natural language processing and we will make use of it. 🤝

  • Sonar

    Write Clean Python Code. Always.. Sonar helps you commit clean code every time. With over 225 unique rules to find Python bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work.

  • haystack

    :mag: Haystack is an open source NLP framework that leverages pre-trained Transformer models. It enables developers to quickly implement production-ready semantic search, question answering, summarization and document ranking for a wide range of NLP applications.

    Project mention: New free tool that uses fine-tuned BERT model to surface answers from research papers | reddit.com/r/LanguageTechnology | 2022-10-28

    Some cool tools like HayStack that would be useful in putting some of these together.

  • ranking

    Learning to Rank in TensorFlow

  • marqo

    Tensor search for humans.

    Project mention: Neural Network vs AI for predicting trends based on market info | reddit.com/r/ArtificialInteligence | 2023-01-01

    Side addition: Marqo can help with better semantics through an external knowledge base. It can help avoid ambiguities and produce better and factually grounded responses. At Marqo (the startup I work for), we created a demo where GPT provides up-to-date news summarisation through the use of Marqo as a knowledge base: https://medium.com/creator-fund/building-search-engines-that-think-like-humans-e019e6fb6389

  • InvoiceNet

    Deep neural network to extract intelligent information from invoice documents.

    Project mention: How would you annotate resumes for object detection? | reddit.com/r/computervision | 2022-03-11

    You can also possibly look at invoice extraction tools such as https://github.com/naiveHobo/InvoiceNet. They solve a similar issue and are researched fairly well, since there is a big market for that.

  • pke

    Python Keyphrase Extraction module

  • InfluxDB

    Build time-series-based applications quickly and at scale.. InfluxDB is the Time Series Platform where developers build real-time applications for analytics, IoT and cloud-native services. Easy to start, it is available in the cloud or on-premises.

  • beir

    A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.

    Project mention: An alternative to Elasticsearch that runs on a few MBs of RAM | news.ycombinator.com | 2022-10-24

    There are actually benchmarks that allow measuring search relevancy objectively, e.g. BEIR[1]. Manticore Search team did an effort to make a PR to include it to the list. The results are here [2]. Unfortunately the BEIR team seems to be too busy to review a whole pile of PRs including about Vespa. Nevertheless it would be nice to have both Meilisearch and Typesense there too since it's interesting what performance those non-tf-idf based search engines would show compared to BM25-based and vector search engines.

    [1] https://github.com/beir-cellar/beir

  • cherche

    📑 Neural Search

    Project mention: [D] is it time to investigate retrieval language models? | reddit.com/r/MachineLearning | 2023-01-19

    Here is a tool I made to create retriever-reader pipeline in a minute: Cherche, would recommend also Haystack on github !

  • forte

    Forte is a flexible and powerful ML workflow builder. This is part of the CASL project: http://casl-project.ai/

    Project mention: [P] How Forte Transforms the Building of ML Solutions with PyTorch into Assembly Lines | reddit.com/r/MachineLearning | 2022-04-30

    Github: https://github.com/asyml/forte Documentation: https://asyml-forte.readthedocs.io/en/latest Technical Report: https://aclanthology.org/2020.emnlp-demos.26/

  • webdork

    A Python tool to automate some dorking stuff to find information disclosures.

    Project mention: Fast-Google-Dorks-Scan | reddit.com/r/OSINT | 2022-05-08
  • PatZilla

    PatZilla is a modular patent information research platform and data integration toolkit with a modern user interface and access to multiple data sources.

  • FreeDiscovery

    Web Service for E-Discovery Analytics

  • FinBERT-QA

    Financial Domain Question Answering with pre-trained BERT Language Model

  • IP-Tracker

    Track any ip address with IP-Tracker. IP-Tracker is developed for Linux and Termux. you can retrieve any ip address information using IP-Tracker.

  • nalcos

    Search Git commits in natural language

  • rakun2

    RaKUn 2.0 - A fast keyword detection algorithm

    Project mention: Very fast graph-based keyword extraction | reddit.com/r/LanguageTechnology | 2022-10-30
  • BERT-QE

    Code and resources for the paper "BERT-QE: Contextualized Query Expansion for Document Re-ranking".

  • kgsearch

    Query and visualize knowledge graphs

    Project mention: Python client for knowledge graph exploration | news.ycombinator.com | 2022-08-11
  • CyberSecurityAuditScript

    Security audit script decreases info gathering from average of 5 minutes, to 20 seconds, and returns everything into a textfile.

    Project mention: A Cyber Security Audit Script I made | reddit.com/r/Python | 2022-11-27
  • website_stats

    a python library that generates website reports

    Project mention: I just made a python package that returns website stats ( rating , user experience, marketing ...) | reddit.com/r/webscraping | 2022-09-02
  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2023-01-19.

Python information-retrieval related posts

Index

What are some of the best open-source information-retrieval projects in Python? This list will help you:

Project Stars
1 EasyOCR 16,848
2 gensim 13,893
3 haystack 6,515
4 ranking 2,586
5 marqo 2,176
6 InvoiceNet 2,110
7 pke 1,345
8 beir 747
9 cherche 227
10 forte 212
11 webdork 125
12 PatZilla 70
13 FreeDiscovery 64
14 FinBERT-QA 62
15 IP-Tracker 59
16 nalcos 50
17 rakun2 47
18 BERT-QE 43
19 kgsearch 19
20 CyberSecurityAuditScript 7
21 website_stats 3
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com