InfluxDB is the Time Series Platform where developers build real-time applications for analytics, IoT and cloud-native services. Easy to start, it is available in the cloud or on-premises. Learn more →
Top 21 Python information-retrieval Projects
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.Project mention: I made a website for a friend who owns a restaurant. He's wondering if there's a way to upload a picture of his menu daily. What is the best way to do this? | reddit.com/r/learnprogramming | 2023-01-15
Topic Modelling for HumansProject mention: Understanding How Dynamic node2vec Works on Streaming Data | dev.to | 2022-12-23
This is our optimization problem. Now, we hope that you have an idea of what our goal is. Luckily for us, this is already implemented in a Python module called gensim. Yes, these guys are brilliant in natural language processing and we will make use of it. 🤝
Write Clean Python Code. Always.. Sonar helps you commit clean code every time. With over 225 unique rules to find Python bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work.
:mag: Haystack is an open source NLP framework that leverages pre-trained Transformer models. It enables developers to quickly implement production-ready semantic search, question answering, summarization and document ranking for a wide range of NLP applications.Project mention: New free tool that uses fine-tuned BERT model to surface answers from research papers | reddit.com/r/LanguageTechnology | 2022-10-28
Some cool tools like HayStack that would be useful in putting some of these together.
Learning to Rank in TensorFlow
Tensor search for humans.Project mention: Neural Network vs AI for predicting trends based on market info | reddit.com/r/ArtificialInteligence | 2023-01-01
Side addition: Marqo can help with better semantics through an external knowledge base. It can help avoid ambiguities and produce better and factually grounded responses. At Marqo (the startup I work for), we created a demo where GPT provides up-to-date news summarisation through the use of Marqo as a knowledge base: https://medium.com/creator-fund/building-search-engines-that-think-like-humans-e019e6fb6389
Deep neural network to extract intelligent information from invoice documents.Project mention: How would you annotate resumes for object detection? | reddit.com/r/computervision | 2022-03-11
You can also possibly look at invoice extraction tools such as https://github.com/naiveHobo/InvoiceNet. They solve a similar issue and are researched fairly well, since there is a big market for that.
Python Keyphrase Extraction module
Build time-series-based applications quickly and at scale.. InfluxDB is the Time Series Platform where developers build real-time applications for analytics, IoT and cloud-native services. Easy to start, it is available in the cloud or on-premises.
A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.Project mention: An alternative to Elasticsearch that runs on a few MBs of RAM | news.ycombinator.com | 2022-10-24
There are actually benchmarks that allow measuring search relevancy objectively, e.g. BEIR. Manticore Search team did an effort to make a PR to include it to the list. The results are here . Unfortunately the BEIR team seems to be too busy to review a whole pile of PRs including about Vespa. Nevertheless it would be nice to have both Meilisearch and Typesense there too since it's interesting what performance those non-tf-idf based search engines would show compared to BM25-based and vector search engines.
📑 Neural SearchProject mention: [D] is it time to investigate retrieval language models? | reddit.com/r/MachineLearning | 2023-01-19
Here is a tool I made to create retriever-reader pipeline in a minute: Cherche, would recommend also Haystack on github !
Forte is a flexible and powerful ML workflow builder. This is part of the CASL project: http://casl-project.ai/Project mention: [P] How Forte Transforms the Building of ML Solutions with PyTorch into Assembly Lines | reddit.com/r/MachineLearning | 2022-04-30
Github: https://github.com/asyml/forte Documentation: https://asyml-forte.readthedocs.io/en/latest Technical Report: https://aclanthology.org/2020.emnlp-demos.26/
A Python tool to automate some dorking stuff to find information disclosures.Project mention: Fast-Google-Dorks-Scan | reddit.com/r/OSINT | 2022-05-08
PatZilla is a modular patent information research platform and data integration toolkit with a modern user interface and access to multiple data sources.
Web Service for E-Discovery Analytics
Financial Domain Question Answering with pre-trained BERT Language Model
Track any ip address with IP-Tracker. IP-Tracker is developed for Linux and Termux. you can retrieve any ip address information using IP-Tracker.
Search Git commits in natural language
RaKUn 2.0 - A fast keyword detection algorithmProject mention: Very fast graph-based keyword extraction | reddit.com/r/LanguageTechnology | 2022-10-30
Code and resources for the paper "BERT-QE: Contextualized Query Expansion for Document Re-ranking".
Query and visualize knowledge graphsProject mention: Python client for knowledge graph exploration | news.ycombinator.com | 2022-08-11
Security audit script decreases info gathering from average of 5 minutes, to 20 seconds, and returns everything into a textfile.Project mention: A Cyber Security Audit Script I made | reddit.com/r/Python | 2022-11-27
a python library that generates website reportsProject mention: I just made a python package that returns website stats ( rating , user experience, marketing ...) | reddit.com/r/webscraping | 2022-09-02
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Python information-retrieval related posts
Neural Network vs AI for predicting trends based on market info
1 project | reddit.com/r/ArtificialInteligence | 1 Jan 2023
Deep learning is a growing trend in healthcare artificial intelligence, but what are the use cases for the various types of deep learning?
1 project | reddit.com/r/deeplearning | 29 Dec 2022
Making a Dialogue Summarizer
1 project | reddit.com/r/deeplearning | 28 Dec 2022
Q&A Model Custom Training for an NLP newbie
2 projects | reddit.com/r/LanguageTechnology | 27 Dec 2022
Sarcasm Detection model [R].
2 projects | reddit.com/r/MachineLearning | 20 Dec 2022
Marqo-YOLO: How to bring highlighting to image search (Article in comments).
1 project | reddit.com/r/Python | 18 Dec 2022
Image search with localization and open-vocabulary reranking using Marqo, yolox, CLIP and OWL-ViT
1 project | dev.to | 15 Dec 2022
A note from our sponsor - InfluxDB
www.influxdata.com | 31 Jan 2023
What are some of the best open-source information-retrieval projects in Python? This list will help you: