Yacy
mwmbl
Yacy | mwmbl | |
---|---|---|
115 | 27 | |
3,260 | 1,370 | |
0.9% | 1.0% | |
8.7 | 9.4 | |
about 1 month ago | about 8 hours ago | |
Java | Python | |
GNU General Public License v3.0 or later | GNU Affero General Public License v3.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Yacy
- New ways we're tackling spammy, low-quality content on Search
- YaCy, a distributed Web Search Engine, based on a peer-to-peer network
-
New 60% of OpenAI model's responses contain plagiarism
It turns out you can make it all the way to become president of Harvard [1] while ignoring this rule so it is questionable whether it is as set in stone as you make it out to be, at least in certain disciplines.
In a way these models are a perfect mirror of the current academic climate. They plagiarise without remorse, they follow the latest identity-politics diktat to a point and make up 'facts' when needed to reach a desired narrative. Google Gemini is the latest example [2] of where this leads.
Given that it is plausible that models like these will soon be used in educational settings this is a recipe for disaster. The same goes for the trend to replace search engine results with 'interpreted' results in which LLMs take up the same role as Winston in 1984: Winston works in the Ministry of Truth where he alters historical records to fit the needs of the Party.
It is time for a decentralised distributed search engine which limits itself to pure search, something like YaCy [3]. Something to replace Winstonian search engines like Google and Bing (et al.).
[1] https://www.campusreform.org/article/claudine-gay-is-a-dei-h...
[2] https://news.ycombinator.com/item?id=39465255
[3] https://yacy.net/
-
Is Google Getting Worse? A Longitudinal Investigation of SEO Spam in Search [pdf]
> Now I just need some kind of open source search engine to run on it ...
Here you go: https://yacy.net
-
Welcome to mwmbl, the free, open-source and non-profit search engine
I remember https://yacy.net/ but the big problem of this project was java and had not implementations in others languages. I mean it as imagine torrent was only in perl.
-
admarus alternatives - ipfs-search and Yacy
3 projects | 9 Aug 2023
Admarus is similar as Yacy but aims to be distributed where Yacy is federated. Both are made for the web
- Brave Search launches own image and video search
-
Show HN: DiskerNet – Browse the Internet from Your Disk, Now Open Source
You should check out https://yacy.net: a global, P2P web search engine, where each peer can build and share its own index, etc.
-
How do you organize your data?
I also have an instance of Yacy installed, which I use to index the entire system, giving me my own private, internal search engine.
- Ask HN: Best search engine alternatives to Google?
mwmbl
- FLaNK Stack Weekly 19 Feb 2024
-
Text Processing Practice Expt: 27 SERP Types to SQLite (Yy084)
echo "https://mwmbl.org/?q=$x"|client 185.34.32.175
-
How bad are search results? Compare Google, Bing, Marginalia, Kagi, and ChatGPT
Ironically I had to use a search engine to discover what "Mwmbl" was. It's apparently a search engine. But, visiting the front page, I see something akin to a git commit log?! I'm not sure I'd have guessed that this was a SE if Brave Search did not tell me it was (even then I'm not convinced yet).
https://mwmbl.org/
-
Indexing a Billion Pages
I believe this is closer to the thing you were asking about, and the simple answer appears to be "a home grown one in python" https://github.com/mwmbl/mwmbl/blob/e544d45c374c13cdc1a5048d...
- Welcome to mwmbl, the free, open-source and non-profit search engine
- Marginalia.nu API
- Show HN: Ichido, search engine that tags sites using Google and Cloudflare
- Introduction!
- Mwmbl, the free, open-source and non-profit search engine
What are some alternatives?
Searx - Privacy-respecting metasearch engine
Lobsters - Computing-focused community centered around link aggregation and discussion
MeiliSearch - A lightning-fast search API that fits effortlessly into your apps, websites, and workflow
whoogle-search - A self-hosted, ad-free, privacy-respecting metasearch engine
searxng - SearXNG is a free internet metasearch engine which aggregates results from various search services and databases. Users are neither tracked nor profiled.
PiTheremin
Gigablast - Nov 20 2017 -- A distributed open source search engine and spider/crawler written in C/C++ for Linux on Intel/AMD. From gigablast dot com, which has binaries for download. See the README.md file at the very bottom of this page for instructions.
code-search-blocklist - A list of domains hosting scrapped code snippets and polluting search results to block.
Seeks - Seeks is a decentralized p2p websearch and collaborative tool.
ublock-origin-shitty-copies-filter - Filter for DuckDuckGo and Google to remove those spam-websites that just blatantly copy and paste content from well known websites.
Typesense - Open Source alternative to Algolia + Pinecone and an Easier-to-Use alternative to ElasticSearch ⚡ 🔍 ✨ Fast, typo tolerant, in-memory fuzzy Search Engine for building delightful search experiences
ublacklist - Blocks specific sites from appearing in Google search results