Sonar helps you commit clean code every time. With over 225 unique rules to find Python bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work. Learn more →
Top 23 Python search-engine Projects
-
Project mention: Little Tricky: Fetching Youtube URLs of List of Song Titles WITHOUT Youtube API? | reddit.com/r/webscraping | 2023-01-29
Scrape the search page? Check Searx
-
Project mention: My slow progression towards and away from NextCloud | reddit.com/r/selfhosted | 2022-11-12
Have a look at mailpile if you are after a web interface; or, the ever-dependable Thunderbird if you are fine with a desktop application.
-
InfluxDB
Build time-series-based applications quickly and at scale.. InfluxDB is the Time Series Platform where developers build real-time applications for analytics, IoT and cloud-native services. Easy to start, it is available in the cloud or on-premises.
-
PaddleNLP
👑 Easy-to-use and powerful NLP library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis and 🖼 Diffusion AIGC system etc.
PaddleNLP
-
Project mention: Tell us about the most underrated & useful FOSS Apps you are using!! | reddit.com/r/fossdroid | 2023-01-27
-
haystack
:mag: Haystack is an open source NLP framework that leverages pre-trained Transformer models. It enables developers to quickly implement production-ready semantic search, question answering, summarization and document ranking for a wide range of NLP applications.
Project mention: New free tool that uses fine-tuned BERT model to surface answers from research papers | reddit.com/r/LanguageTechnology | 2022-10-28Some cool tools like HayStack that would be useful in putting some of these together.
-
Project mention: RARBG website not showing magnet symbol, thus enabling to download from RARBG | reddit.com/r/torrents | 2023-01-14
So you're saying this doesn't exist?
-
-
Sonar
Write Clean Python Code. Always.. Sonar helps you commit clean code every time. With over 225 unique rules to find Python bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work.
-
-
-
Search Engine Parser
Lightweight package to query popular search engines and scrape for result titles, links and descriptions
-
-
HyperTag
NeoVerse/HyperTag - Intuitive Knowledge Management WebApp & CLI for Humans using Deep Learning & Tags
-
khoj
Natural Language Search Engine for your Org-Mode and Markdown notes, Beancount transactions and Photos
Project mention: AI model for retrieving files from Org-Roam directory? | reddit.com/r/emacs | 2023-01-25You might want to have a look at Khoj (https://github.com/debanjum/khoj) and the post about it in this subreddit.
-
-
-
PatZilla
PatZilla is a modular patent information research platform and data integration toolkit with a modern user interface and access to multiple data sources.
-
openverse-api
The Openverse API allows programmatic access to search for CC-licensed and public domain digital media.
-
Financial Question Answering System
-
-
domhttpx
domhttpx is a google search engine dorker with HTTP toolkit built with python, can make it easier for you to find many URLs/IPs at once with fast time.
-
swirl-search
SWIRL queries any number of data sources - search engines, databases, noSQL engines, cloud/SaaS services with APIs etc - and uses Large Language Models to re-rank the unified results without extracting and indexing anything. Includes connectors to apache solr, elastic, PostgreSQL and generic web/json.
Project mention: I wrote a federated search engine called SWIRL SEARCH http://swirl.today/ | reddit.com/r/Python | 2022-11-05BTW here is an overview of Federated Search, how it differs from traditional indexing approaches - and why it can solve multi-silo search problems in a fraction of the time *without* moving data... https://github.com/sidprobstein/swirl-search/wiki
-
Project mention: Almost all searches on my independent search engine are now from SEO spam bots | news.ycombinator.com | 2022-05-16
Thanks V. I'm seeing a similar number of problem search requests (although nowhere near as many real search requests:-), so it is probably the same "SEO practitioners" running the same "scraping footprints" against different search engines around the same time.
I was kind-of hoping that somewhere in this discussion there would be an "And the answer to your problem is...", but I suppose it is a very specific problem which only a search engine would encounter. I think the Cloudflare solution you have is probably the best to block the requests as early as possible. The reverse proxy config[0] I've got seems to be mostly holding out for now though.
[0] https://github.com/searchmysite/searchmysite.net/issues/55
-
openverse-catalog
Identifies and collects data on cc-licensed content across web crawl data and public apis.
Like with any other issue, I kind of look at it at large and think either "This seems do-able" or "Pass", this one was in the first category: openverse-catalog. I saw that I just had to add a string to some header and thought maybe this is something I can actually do. Maybe it was, I won't be able to find out because I could not get the project to run.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Python search-engine related posts
- Little Tricky: Fetching Youtube URLs of List of Song Titles WITHOUT Youtube API?
- RARBG website not showing magnet symbol, thus enabling to download from RARBG
- Manually add a custom theme to Searxng?
- Seeking advice for a little-more-than-beginner python guy
- 21 December 2022 - Daily Chat Thread
- Introduction!
- Is it just me or is it next to impossible to search for anything on Google or duckduckgo that doesn't have a left leaning bias?
-
A note from our sponsor - Sonar
www.sonarsource.com | 30 Jan 2023
Index
What are some of the best open-source search-engine projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | Searx | 12,465 |
2 | Mailpile | 8,683 |
3 | PaddleNLP | 7,177 |
4 | whoogle-search | 6,955 |
5 | haystack | 6,515 |
6 | search-plugins | 2,605 |
7 | bertsearch | 843 |
8 | Maryam | 744 |
9 | mwmbl | 652 |
10 | Search Engine Parser | 369 |
11 | Yuno | 355 |
12 | HyperTag | 167 |
13 | khoj | 159 |
14 | houndsploit | 99 |
15 | achoz | 70 |
16 | PatZilla | 70 |
17 | openverse-api | 64 |
18 | jina-financial-qa-search | 61 |
19 | horapy | 59 |
20 | domhttpx | 58 |
21 | swirl-search | 57 |
22 | searchmysite.net | 55 |
23 | openverse-catalog | 44 |