Python Search

Open-source Python projects categorized as Search

Top 23 Python Search Projects

  1. algorithms

    Minimal examples of data structures and algorithms in Python

  2. InfluxDB

    InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.

    InfluxDB logo
  3. searxng

    SearXNG is a free internet metasearch engine which aggregates results from various search services and databases. Users are neither tracked nor profiled.

    Project mention: Leta – privacy focused search engine from Mullvad | news.ycombinator.com | 2025-05-28

    What would be the difference from duckduckgo lite?

    https://docs.searxng.org/

  4. txtai

    💡 All-in-one open-source AI framework for semantic search, LLM orchestration and language model workflows

    Project mention: Chunking your data for RAG | dev.to | 2025-02-11
  5. paper-qa

    High accuracy RAG for answering questions from scientific documents with citations

    Project mention: Show HN: Trieve CLI – Terminal-Based LLM Agent Loop with Search Tool for PDFs | news.ycombinator.com | 2025-06-18

    https://github.com/Future-House/paper-qa?tab=readme-ov-file#... :

    > PaperQA2 is engineered to be the best agentic RAG model for working with scientific papers.

    > [ Semantic Scholar, CrossRef, ]

    paperqa-zotero: https://github.com/lejacobroy/paperqa-zotero

    The Oracle of Zotero is a fork of paperqa-zotero fork FAISS and langchain:

  6. R2R

    SoTA production-ready AI retrieval system. Agentic Retrieval-Augmented Generation (RAG) with a RESTful API.

    Project mention: Show HN: Toller – A Python library for robust async calls | news.ycombinator.com | 2025-05-13

    I built this after a painful incident with one of my R2R (https://github.com/SciPhi-AI/R2R) clients where Azure OpenAI went down unexpectedly. While we were technically propagating errors correctly, we lacked clean, accessible error patterns that would allow the client to implement proper mitigation strategies. They were fully dependent on our infrastructure to handle the outage, with no way to gracefully degrade or implement custom fallbacks.

  7. buku

    :bookmark: Personal mini-web in text

  8. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  9. search-plugins

    Search plugins for the search feature

    Project mention: Netflix will show generative AI ads midway through streams in 2026 | news.ycombinator.com | 2025-05-15

    https://github.com/Jackett/Jackett | https://github.com/qbittorrent/search-plugins/wiki/How-to-co...

  10. tribler

    Privacy enhanced BitTorrent client with P2P content discovery

  11. elasticsearch-py

    Official Python client for Elasticsearch

  12. elasticsearch-dsl-py

    High level Python client for Elasticsearch

  13. django-haystack

    Modular search for Django

  14. Windrecorder

    Windrecorder is a memory search app by records everything on your screen in small size, to let you rewind what you have seen, query through OCR text or image description, and get activity statistics. (by yuka-friends)

  15. image-match

    🎇 Quickly search over billions of images

  16. datasketch

    MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble and HNSW

  17. twikit

    Twitter API Scraper | Without an API key | Twitter Internal API | Free | Twitter scraper | Twitter Bot

    Project mention: Show HN: I made a free tool that analyzes SEC filings and posts detailed reports | news.ycombinator.com | 2025-04-14

    Unpopular opinion here... If you tread carefully you'll most likely not succeed. I am not American and I know you guys like to sue eachother for putting cats in microwaves and stuff so maybe this is not great opinion to have in America at the current moment.

    I would go for it and put a disclaimer, or I would just incorporate in a country where there's no issues with these things.

    all this is hard of course to provide good value, but worthwhile.

    Twitters' cost is insane right now, I had quite a few ideas for twitter integrations but they would easily cost thousands per month just to access their API.

    I looked into https://github.com/d60/twikit - might not be suitable but you can definitely play around with it. Just don't use your official account as I got shadow banned using it unfortunately.

  18. openrecall

    OpenRecall is a fully open-source, privacy-first alternative to proprietary solutions like Microsoft's Windows Recall. With OpenRecall, you can easily access your digital history, enhancing your memory and productivity without compromising your privacy.

    Project mention: Memos – An open source Rewinds / Recall | news.ycombinator.com | 2024-11-17

    Another similar project: https://github.com/openrecall/openrecall

  19. JobFunnel

    Scrape job websites into a single spreadsheet with no duplicates.

    Project mention: Show HN: Scraper for job listings directly from company websites | news.ycombinator.com | 2024-12-07

    jobfunnel is FOSS and accepting contributions: https://github.com/PaulMcInnis/JobFunnel

    Currently supports indeed, in the past supported glassdoor and others.

  20. twitter-api-client

    Implementation of X/Twitter v1, v2, and GraphQL APIs (by trevorhobenshield)

  21. RecoverPy

    Interactively find and recover deleted or :point_right: overwritten :point_left: files from your terminal

  22. paperai

    📄 🤖 Semantic search and workflows for medical/scientific papers

    Project mention: Show HN: Trieve CLI – Terminal-Based LLM Agent Loop with Search Tool for PDFs | news.ycombinator.com | 2025-06-18

    https://github.com/neuml/paperai :

    > paperai is a combination of a txtai embeddings index and a SQLite database with the articles. Each article is parsed into sentences and stored in SQLite along with the article metadata. Embeddings are built over the full corpus.

    paperai has a

  23. Memacs

    What did I do on February 14th 2007? Visualize your (digital) life in Org-mode

  24. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Search discussion

Log in or Post with

Python Search related posts

  • Show HN: Trieve CLI – Terminal-Based LLM Agent Loop with Search Tool for PDFs

    5 projects | news.ycombinator.com | 18 Jun 2025
  • Leta – privacy focused search engine from Mullvad

    1 project | news.ycombinator.com | 28 May 2025
  • Show HN: Toller – A Python library for robust async calls

    2 projects | news.ycombinator.com | 13 May 2025
  • Ingest (almost) any non-PDF document in a vector database, effortlessly

    4 projects | dev.to | 25 Apr 2025
  • Kagi Is Bringing Orion Web Browser to Linux

    4 projects | news.ycombinator.com | 8 Mar 2025
  • Chunking your data for RAG

    13 projects | dev.to | 11 Feb 2025
  • Analyzing LinkedIn Company Posts with Graphs and Agents

    1 project | dev.to | 12 Jan 2025
  • A note from our sponsor - SaaSHub
    www.saashub.com | 18 Jun 2025
    SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source Search projects in Python? This list will help you:

# Project Stars
1 algorithms 24,571
2 searxng 19,666
3 txtai 11,078
4 whoogle-search 10,755
5 paper-qa 7,477
6 R2R 6,973
7 buku 6,728
8 search-plugins 5,217
9 tribler 5,005
10 elasticsearch-py 4,303
11 elasticsearch-dsl-py 3,868
12 django-haystack 3,653
13 Windrecorder 3,259
14 image-match 2,965
15 swirl-search 2,790
16 datasketch 2,704
17 twikit 2,718
18 openrecall 2,208
19 JobFunnel 2,026
20 twitter-api-client 1,795
21 RecoverPy 1,498
22 paperai 1,414
23 Memacs 1,059

Sponsored
InfluxDB – Built for High-Performance Time Series Workloads
InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
www.influxdata.com

Did you know that Python is
the 2nd most popular programming language
based on number of references?