InfluxDB is the Time Series Platform where developers build real-time applications for analytics, IoT and cloud-native services. Easy to start, it is available in the cloud or on-premises. Learn more →
Top 21 Python information-retrieval Projects
-
EasyOCR
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
Project mention: I made a website for a friend who owns a restaurant. He's wondering if there's a way to upload a picture of his menu daily. What is the best way to do this? | reddit.com/r/learnprogramming | 2023-01-15 -
This is our optimization problem. Now, we hope that you have an idea of what our goal is. Luckily for us, this is already implemented in a Python module called gensim. Yes, these guys are brilliant in natural language processing and we will make use of it. 🤝
-
Sonar
Write Clean Python Code. Always.. Sonar helps you commit clean code every time. With over 225 unique rules to find Python bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work.
-
haystack
:mag: Haystack is an open source NLP framework that leverages pre-trained Transformer models. It enables developers to quickly implement production-ready semantic search, question answering, summarization and document ranking for a wide range of NLP applications.
Project mention: New free tool that uses fine-tuned BERT model to surface answers from research papers | reddit.com/r/LanguageTechnology | 2022-10-28Some cool tools like HayStack that would be useful in putting some of these together.
-
-
Project mention: Neural Network vs AI for predicting trends based on market info | reddit.com/r/ArtificialInteligence | 2023-01-01
Side addition: Marqo can help with better semantics through an external knowledge base. It can help avoid ambiguities and produce better and factually grounded responses. At Marqo (the startup I work for), we created a demo where GPT provides up-to-date news summarisation through the use of Marqo as a knowledge base: https://medium.com/creator-fund/building-search-engines-that-think-like-humans-e019e6fb6389
-
Project mention: How would you annotate resumes for object detection? | reddit.com/r/computervision | 2022-03-11
You can also possibly look at invoice extraction tools such as https://github.com/naiveHobo/InvoiceNet. They solve a similar issue and are researched fairly well, since there is a big market for that.
-
-
InfluxDB
Build time-series-based applications quickly and at scale.. InfluxDB is the Time Series Platform where developers build real-time applications for analytics, IoT and cloud-native services. Easy to start, it is available in the cloud or on-premises.
-
beir
A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.
Project mention: An alternative to Elasticsearch that runs on a few MBs of RAM | news.ycombinator.com | 2022-10-24There are actually benchmarks that allow measuring search relevancy objectively, e.g. BEIR[1]. Manticore Search team did an effort to make a PR to include it to the list. The results are here [2]. Unfortunately the BEIR team seems to be too busy to review a whole pile of PRs including about Vespa. Nevertheless it would be nice to have both Meilisearch and Typesense there too since it's interesting what performance those non-tf-idf based search engines would show compared to BM25-based and vector search engines.
-
Project mention: [D] is it time to investigate retrieval language models? | reddit.com/r/MachineLearning | 2023-01-19
Here is a tool I made to create retriever-reader pipeline in a minute: Cherche, would recommend also Haystack on github !
-
forte
Forte is a flexible and powerful ML workflow builder. This is part of the CASL project: http://casl-project.ai/
Project mention: [P] How Forte Transforms the Building of ML Solutions with PyTorch into Assembly Lines | reddit.com/r/MachineLearning | 2022-04-30Github: https://github.com/asyml/forte Documentation: https://asyml-forte.readthedocs.io/en/latest Technical Report: https://aclanthology.org/2020.emnlp-demos.26/
-
-
PatZilla
PatZilla is a modular patent information research platform and data integration toolkit with a modern user interface and access to multiple data sources.
-
-
-
IP-Tracker
Track any ip address with IP-Tracker. IP-Tracker is developed for Linux and Termux. you can retrieve any ip address information using IP-Tracker.
-
-
Project mention: Very fast graph-based keyword extraction | reddit.com/r/LanguageTechnology | 2022-10-30
-
BERT-QE
Code and resources for the paper "BERT-QE: Contextualized Query Expansion for Document Re-ranking".
-
-
CyberSecurityAuditScript
Security audit script decreases info gathering from average of 5 minutes, to 20 seconds, and returns everything into a textfile.
-
Project mention: I just made a python package that returns website stats ( rating , user experience, marketing ...) | reddit.com/r/webscraping | 2022-09-02
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Python information-retrieval related posts
- Neural Network vs AI for predicting trends based on market info
- Deep learning is a growing trend in healthcare artificial intelligence, but what are the use cases for the various types of deep learning?
- Making a Dialogue Summarizer
- Q&A Model Custom Training for an NLP newbie
- Sarcasm Detection model [R].
- Marqo-YOLO: How to bring highlighting to image search (Article in comments).
- Image search with localization and open-vocabulary reranking using Marqo, yolox, CLIP and OWL-ViT
-
A note from our sponsor - InfluxDB
www.influxdata.com | 31 Jan 2023
Index
What are some of the best open-source information-retrieval projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | EasyOCR | 16,848 |
2 | gensim | 13,893 |
3 | haystack | 6,515 |
4 | ranking | 2,586 |
5 | marqo | 2,176 |
6 | InvoiceNet | 2,110 |
7 | pke | 1,345 |
8 | beir | 747 |
9 | cherche | 227 |
10 | forte | 212 |
11 | webdork | 125 |
12 | PatZilla | 70 |
13 | FreeDiscovery | 64 |
14 | FinBERT-QA | 62 |
15 | IP-Tracker | 59 |
16 | nalcos | 50 |
17 | rakun2 | 47 |
18 | BERT-QE | 43 |
19 | kgsearch | 19 |
20 | CyberSecurityAuditScript | 7 |
21 | website_stats | 3 |