SaaSHub helps you find the best software and product alternatives Learn more →
Top 23 Python Search Projects
-
-
Scout Monitoring
Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
-
searxng
SearXNG is a free internet metasearch engine which aggregates results from various search services and databases. Users are neither tracked nor profiled.
Project mention: Creating a search engine for fun and because Google sucks | news.ycombinator.com | 2024-09-08As someone who cares about their online searches actually being good, fast and private, I cannot recommend SearXNG more. https://github.com/searxng/searxng/
It's a metasearch engine that can query multiple search providers at once, including google, so you're not missing out on the good results you expect. Pick an instance at https://searx.space/ and tell your friends!
-
Project mention: Whoogle: Self-hosted ad-free privacy-respecting metasearch with Google results | news.ycombinator.com | 2024-06-29
-
txtai
💡 All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows
txtai is an all-in-one embeddings database for semantic search, LLM orchestration and language model workflows.
-
I really like the buku terminal bookmark manager. https://github.com/jarun/buku I like that I can just `man buku` when I don't understand something and I can actually find the answer I'm looking for.
-
Project mention: Tribler: An attack-resilient micro-economy for media | news.ycombinator.com | 2024-04-25
I noticed that too:
https://github.com/Tribler/tribler/wiki/%22TrustChain%22-arc...
But not much else about it. Would be interested to read more. Using torrent seeding as a form of Proof-of-Work that rewards tokens is actually an interesting use case for cryptocurrency, and not as energy-hungry.
-
MindSearch
🔍 An LLM-based Multi-agent Framework of Web Search Engine (like Perplexity.ai Pro and SearchGPT)
Project mention: MindSearch: LLM-Based Web Search Engine Similar to Perplexity.ai and SearchGPT | news.ycombinator.com | 2024-08-01 -
InfluxDB
Purpose built for real-time analytics at any scale. InfluxDB Platform is powered by columnar analytics, optimized for cost-efficient storage, and built with open data standards.
-
-
here
-
-
-
R2R
The Elasticsearch for RAG - R2R lets you build, scale, and manage user-facing Retrieval-Augmented Generation applications in production.
Project mention: Show HN: R2R V2 – A open source RAG engine with prod features | news.ycombinator.com | 2024-06-26Hi HN!
We're building R2R [https://github.com/SciPhi-AI/R2R], an opinionated open source RAG answer engine that is built on top of Postgres+Neo4j. The best way to get started is with the docs - https://r2r-docs.sciphi.ai/introduction.
Our V2 represents a major update from our V1 which we have spent the last 3 months intensely building after getting a ton of great feedback from our first ShowHN. New features include multimodal data ingestion, hybrid search with reranking, advanced RAG techniques (e.g. HyDE), automatic knowledge graph construction alongside the original goal of an observable RAG system built on top of a RESTful API that we first shared.
The problem: Developers struggle to build truthful, accurate RAG solutions. Popular tools like Langchain are complex and lack crucial production features such as user/document management, observability, and a REST API. We experienced these challenges firsthand while building a large-scale semantic search engine, having users report numerous hallucinations and inaccuracies. This highlighted that search+RAG is a difficult problem. We're convinced that these missing features, and more, are essential to effectively monitor and improve such systems over time.
We decided to build R2R so you can quickly build an AI system for question answering that you can rely on to improve with use. We wanted to make it as simple as possible to build, monitor, and improve a state-of-the-art RAG engine using any source of data.
Teams have been using R2R to develop custom AI agents with their own data, with applications ranging from B2B lead generation to research assistants. Best of all, the developer experience is much improved. For example, we have recently seen multiple teams use R2R to deploy a user-facing RAG engine for their application within a day. By day 2 some of these same teams were using their generated logs to tune the system with advanced features like hybrid search and HyDE.
Here are a few examples of how R2R can outperform classic RAG with semantic search only:
1. “What were the UK's top exports in 2023?". R2R with hybrid search can identify documents mentioning "UK exports" and "2023", whereas semantic search finds related concepts like trade balance and economic reports.
2. "List all YC founders that worked at Google and now have an AI startup." Our knowledge graph feature allows R2R to understand relationships between employees and projects, answering a query that would be challenging for simple vector search.
3. “Compare `The Great Gatsby` to `1984`. Advanced RAG techniques supported by R2R can use agentic behavior to answer separate queries like “key themes of The Great Gatsby” and “key themes of 1984” and then perform aggregation. This gives a better answer semantic semantic search results of the original query, which for the example shown above are likely to be quite poor.
The built in observability and customizability of R2R helps you to tune and improve your system long after launching. Our plan is to keep the API ~fixed while we iterate on the internal system logic, making it easier for developers to trust R2R for production from day 1.
Our roadmap is still tentative, but we are working on the following: (1) Improve semantic chunking through third party providers or our own custom LLMs; (2) Training a custom model for knowledge graph triples extraction that will allow KG construction to be 10x more efficient. (This is in private beta, please reach out if interested!); (3) Ability to handle permissions at a more granular level than just a single user; (4) LLM-powered online evaluation of system performance + enhanced analytics and metrics.
Getting started is easy. R2R is a lightweight repository that you can install locally with `pip install r2r`, or run with Docker. Check out our quickstart guide: https://r2r-docs.sciphi.ai/quickstart. Lastly, if it interests you, we are also working on a cloud solution at https://sciphi.ai.
-
-
Windrecorder
Windrecorder is a memory search app by records everything on your screen in small size, to let you rewind what you have seen, query through OCR text or image description, and get activity statistics. (by yuka-friends)
-
datasketch
MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble and HNSW
-
-
openrecall
OpenRecall is a fully open-source, privacy-first alternative to proprietary solutions like Microsoft's Windows Recall. With OpenRecall, you can easily access your digital history, enhancing your memory and productivity without compromising your privacy.
-
swirl-search
Swirl is an open-source search platform that uses AI to search multiple content and data sources simultaneously and return AI-ranked results. And provides summaries of your answers from searches using LLMs. It's a one-click, easy-to-use Retrieval Augmented Generation (RAG) Solution.
Project mention: GitHub - swirlai/swirl-search: Swirl is an open-source search platform that uses AI to search multiple content and data sources simultaneously, finds the best results using a reader LLM, then prompts Generative AI, enabling you to get answers based on your data. | /r/programming | 2023-12-05 -
-
RecoverPy
Interactively find and recover deleted or :point_right: overwritten :point_left: files from your terminal
Project mention: RecoverPy 2.1.3: A Linux tool to recover deleted or overwritten files | /r/opensource | 2023-10-23 -
Project mention: WARC-GPT: An Open-Source Tool for Exploring Web Archives Using AI | news.ycombinator.com | 2024-06-09
... thought I had seen this but that was paper-ai; https://news.ycombinator.com/item?id=39363115 : gh topic: pdfgpt
neuml/paperai has a YAML report definition schema: https://github.com/neuml/paperai : > Semantic search and workflows for medical/scientific paper :
python -m paperai.report report.yml 50 md
-
twikit
Twitter API Scraper | Without an API key | Twitter Internal API | Free | Twitter scraper | Twitter Bot
Project mention: Show HN: Twitter API Wrapper for Python – No API Keys Needed | news.ycombinator.com | 2024-02-03 -
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Python Search discussion
Python Search related posts
-
Just use Postgres
-
Ask HN: Discernment Lattice [Prompting]
-
Ask HN: Company wants us to upskill in AI. What is the best approach?
-
Does Sundar Pichai/Search team know how bad Google search is?
-
txtai: Open-source vector search and RAG for minimalists
-
txtai 7.3 released: Adds new RAG Web Apps and streaming LLM/RAG support
-
txtai: Vector search, Knowledge Graphs, RAG and LLM workflows locally run
-
A note from our sponsor - SaaSHub
www.saashub.com | 8 Sep 2024
Index
What are some of the best open-source Search projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | algorithms | 23,929 |
2 | searxng | 12,181 |
3 | whoogle-search | 9,313 |
4 | txtai | 8,652 |
5 | buku | 6,440 |
6 | tribler | 4,772 |
7 | MindSearch | 4,439 |
8 | elasticsearch-py | 4,207 |
9 | search-plugins | 3,982 |
10 | elasticsearch-dsl-py | 3,813 |
11 | django-haystack | 3,587 |
12 | R2R | 3,179 |
13 | image-match | 2,932 |
14 | Windrecorder | 2,858 |
15 | datasketch | 2,508 |
16 | JobFunnel | 1,770 |
17 | openrecall | 1,716 |
18 | swirl-search | 1,667 |
19 | twitter-api-client | 1,515 |
20 | RecoverPy | 1,284 |
21 | paperai | 1,265 |
22 | twikit | 1,167 |
23 | Memacs | 1,002 |