Python Search

Open-source Python projects categorized as Search

Top 23 Python Search Projects

  • algorithms

    Minimal examples of data structures and algorithms in Python

  • Scout Monitoring

    Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.

    Scout Monitoring logo
  • searxng

    SearXNG is a free internet metasearch engine which aggregates results from various search services and databases. Users are neither tracked nor profiled.

    Project mention: Creating a search engine for fun and because Google sucks | news.ycombinator.com | 2024-09-08

    As someone who cares about their online searches actually being good, fast and private, I cannot recommend SearXNG more. https://github.com/searxng/searxng/

    It's a metasearch engine that can query multiple search providers at once, including google, so you're not missing out on the good results you expect. Pick an instance at https://searx.space/ and tell your friends!

  • txtai

    💡 All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows

    Project mention: Embeddings index format for open data access | dev.to | 2024-09-06

    txtai is an all-in-one embeddings database for semantic search, LLM orchestration and language model workflows.

  • buku

    :bookmark: Personal mini-web in text

    Project mention: Enlightenmentware | news.ycombinator.com | 2024-05-20

    I really like the buku terminal bookmark manager. https://github.com/jarun/buku I like that I can just `man buku` when I don't understand something and I can actually find the answer I'm looking for.

  • tribler

    Privacy enhanced BitTorrent client with P2P content discovery

    Project mention: Tribler: An attack-resilient micro-economy for media | news.ycombinator.com | 2024-04-25

    I noticed that too:

    https://github.com/Tribler/tribler/wiki/%22TrustChain%22-arc...

    But not much else about it. Would be interested to read more. Using torrent seeding as a form of Proof-of-Work that rewards tokens is actually an interesting use case for cryptocurrency, and not as energy-hungry.

  • MindSearch

    🔍 An LLM-based Multi-agent Framework of Web Search Engine (like Perplexity.ai Pro and SearchGPT)

    Project mention: MindSearch: LLM-Based Web Search Engine Similar to Perplexity.ai and SearchGPT | news.ycombinator.com | 2024-08-01
  • InfluxDB

    Purpose built for real-time analytics at any scale. InfluxDB Platform is powered by columnar analytics, optimized for cost-efficient storage, and built with open data standards.

    InfluxDB logo
  • elasticsearch-py

    Official Python client for Elasticsearch

  • search-plugins

    Search plugins for the search feature

    Project mention: Whats the best browser for torrenting? | /r/torrents | 2023-12-10

    here

  • elasticsearch-dsl-py

    High level Python client for Elasticsearch

  • django-haystack

    Modular search for Django

  • R2R

    The Elasticsearch for RAG - R2R lets you build, scale, and manage user-facing Retrieval-Augmented Generation applications in production.

    Project mention: Show HN: R2R V2 – A open source RAG engine with prod features | news.ycombinator.com | 2024-06-26

    Hi HN!

    We're building R2R [https://github.com/SciPhi-AI/R2R], an opinionated open source RAG answer engine that is built on top of Postgres+Neo4j. The best way to get started is with the docs - https://r2r-docs.sciphi.ai/introduction.

    Our V2 represents a major update from our V1 which we have spent the last 3 months intensely building after getting a ton of great feedback from our first ShowHN. New features include multimodal data ingestion, hybrid search with reranking, advanced RAG techniques (e.g. HyDE), automatic knowledge graph construction alongside the original goal of an observable RAG system built on top of a RESTful API that we first shared.

    The problem: Developers struggle to build truthful, accurate RAG solutions. Popular tools like Langchain are complex and lack crucial production features such as user/document management, observability, and a REST API. We experienced these challenges firsthand while building a large-scale semantic search engine, having users report numerous hallucinations and inaccuracies. This highlighted that search+RAG is a difficult problem. We're convinced that these missing features, and more, are essential to effectively monitor and improve such systems over time.

    We decided to build R2R so you can quickly build an AI system for question answering that you can rely on to improve with use. We wanted to make it as simple as possible to build, monitor, and improve a state-of-the-art RAG engine using any source of data.

    Teams have been using R2R to develop custom AI agents with their own data, with applications ranging from B2B lead generation to research assistants. Best of all, the developer experience is much improved. For example, we have recently seen multiple teams use R2R to deploy a user-facing RAG engine for their application within a day. By day 2 some of these same teams were using their generated logs to tune the system with advanced features like hybrid search and HyDE.

    Here are a few examples of how R2R can outperform classic RAG with semantic search only:

    1. “What were the UK's top exports in 2023?". R2R with hybrid search can identify documents mentioning "UK exports" and "2023", whereas semantic search finds related concepts like trade balance and economic reports.

    2. "List all YC founders that worked at Google and now have an AI startup." Our knowledge graph feature allows R2R to understand relationships between employees and projects, answering a query that would be challenging for simple vector search.

    3. “Compare `The Great Gatsby` to `1984`. Advanced RAG techniques supported by R2R can use agentic behavior to answer separate queries like “key themes of The Great Gatsby” and “key themes of 1984” and then perform aggregation. This gives a better answer semantic semantic search results of the original query, which for the example shown above are likely to be quite poor.

    The built in observability and customizability of R2R helps you to tune and improve your system long after launching. Our plan is to keep the API ~fixed while we iterate on the internal system logic, making it easier for developers to trust R2R for production from day 1.

    Our roadmap is still tentative, but we are working on the following: (1) Improve semantic chunking through third party providers or our own custom LLMs; (2) Training a custom model for knowledge graph triples extraction that will allow KG construction to be 10x more efficient. (This is in private beta, please reach out if interested!); (3) Ability to handle permissions at a more granular level than just a single user; (4) LLM-powered online evaluation of system performance + enhanced analytics and metrics.

    Getting started is easy. R2R is a lightweight repository that you can install locally with `pip install r2r`, or run with Docker. Check out our quickstart guide: https://r2r-docs.sciphi.ai/quickstart. Lastly, if it interests you, we are also working on a cloud solution at https://sciphi.ai.

  • image-match

    🎇 Quickly search over billions of images

  • Windrecorder

    Windrecorder is a memory search app by records everything on your screen in small size, to let you rewind what you have seen, query through OCR text or image description, and get activity statistics. (by yuka-friends)

    Project mention: Windrecorder – Personal Memory Search Engine | news.ycombinator.com | 2024-05-31
  • datasketch

    MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble and HNSW

  • JobFunnel

    Scrape job websites into a single spreadsheet with no duplicates.

  • openrecall

    OpenRecall is a fully open-source, privacy-first alternative to proprietary solutions like Microsoft's Windows Recall. With OpenRecall, you can easily access your digital history, enhancing your memory and productivity without compromising your privacy.

    Project mention: OpenRecall | news.ycombinator.com | 2024-06-11
  • twitter-api-client

    Implementation of X/Twitter v1, v2, and GraphQL APIs (by trevorhobenshield)

  • RecoverPy

    Interactively find and recover deleted or :point_right: overwritten :point_left: files from your terminal

    Project mention: RecoverPy 2.1.3: A Linux tool to recover deleted or overwritten files | /r/opensource | 2023-10-23
  • paperai

    📄 🤖 Semantic search and workflows for medical/scientific papers

    Project mention: WARC-GPT: An Open-Source Tool for Exploring Web Archives Using AI | news.ycombinator.com | 2024-06-09

    ... thought I had seen this but that was paper-ai; https://news.ycombinator.com/item?id=39363115 : gh topic: pdfgpt

    neuml/paperai has a YAML report definition schema: https://github.com/neuml/paperai : > Semantic search and workflows for medical/scientific paper :

      python -m paperai.report report.yml 50 md 

  • twikit

    Twitter API Scraper | Without an API key | Twitter Internal API | Free | Twitter scraper | Twitter Bot

    Project mention: Show HN: Twitter API Wrapper for Python – No API Keys Needed | news.ycombinator.com | 2024-02-03
  • Memacs

    What did I do on February 14th 2007? Visualize your (digital) life in Org-mode

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Search discussion

Log in or Post with

Python Search related posts

Index

What are some of the best open-source Search projects in Python? This list will help you:

Project Stars
1 algorithms 23,929
2 searxng 12,181
3 whoogle-search 9,313
4 txtai 8,652
5 buku 6,440
6 tribler 4,772
7 MindSearch 4,439
8 elasticsearch-py 4,207
9 search-plugins 3,982
10 elasticsearch-dsl-py 3,813
11 django-haystack 3,587
12 R2R 3,179
13 image-match 2,932
14 Windrecorder 2,858
15 datasketch 2,508
16 JobFunnel 1,770
17 openrecall 1,716
18 swirl-search 1,667
19 twitter-api-client 1,515
20 RecoverPy 1,284
21 paperai 1,265
22 twikit 1,167
23 Memacs 1,002

Sponsored
Free Django app performance insights with Scout Monitoring
Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
www.scoutapm.com