Welcome to mwmbl, the free, open-source and non-profit search engine

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

mwmbl

27 1,355 9.4 Python

An open source, non-profit search engine implemented in python

I wondered if this approach would be feasible for a distributed crawler: https://github.com/mwmbl/mwmbl#crawling
Also, your own posting appears to be missing from the index: https://mwmbl.org/?q=mwmbl+ycombinator
(and, yes, another vote for changing the domain name; you can have a quirky project name, but if I can't remember the cat-walking-on-keyboard domain, I'm not going to use it)

PeARS-orchard

1 35 0.0 HTML

This is the development version of PeARS, the people's search engine. More compact but less robust than PeARS-lite. If you just want to use PeARS as a local indexer, use PeARS-lite instead.

> We now have a distributed crawler that runs on our volunteers' machines! If you have Firefox you can help out by installing our extension.
This is a very interesting idea that other search engines have tried before. Actually, the Brave search engine is built over Cliqz[6] that implemented this same idea but *without* the user's consent.
Copy pasting from an old comment I made about this "human web" crawler idea:
Both PeARS[1] and Cliqz[2] tried to do that. Both got direct support from Mozilla[3][4] but it looks like neither really kicked off.
PeARS was meant to be installed voluntarily by users who would then choose to share their indexes only to those they personally trusted, so the idea is very privacy conscious but also very hard to scale.
Cliqz, on the other hand, apparently tried to work around that issue by having their add-on bundled by default in some Firefox installations[5] which was obviously very controversial because of its privacy and user consent implications.
I still think the idea has potential, though, even if it's in a more limited scope.
[1] https://github.com/PeARSearch/PeARS-orchard
[2] https://cliqz.com/en/whycliqz/human-web
[3] https://blog.mozilla.org/press-uk/2016/06/22/mozilla-gives-3...
[4] https://blog.mozilla.org/press-uk/2016/08/23/mozilla-makes-s...
[5] https://www.zdnet.com/article/firefox-tests-cliqz-engine-whi...
[6] https://www.theregister.com/2021/03/03/brave_buys_a_search_e...

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
searxng

121 8,263 9.8 Python

SearXNG is a free internet metasearch engine which aggregates results from various search services and databases. Users are neither tracked nor profiled.

For my first search (current work problem) "rust json diff" it only found 6 links, only one of which was a rust crate. Unfortunate.
Second Search: "black sabbath sleeping village lyrics" only gave 2 results, only one of which was correct.
Also the repo is missing the SearXNG[1] search engine.
[1] https://github.com/searxng/searxng

Yacy

115 3,253 8.7 Java

Distributed Peer-to-Peer Web Search Engine and Intranet Search Appliance

I remember https://yacy.net/ but the big problem of this project was java and had not implementations in others languages. I mean it as imagine torrent was only in perl.

parquet-floor

1 36 3.4 Java

A lightweight Java library that facilitates reading and writing Apache Parquet files without Hadoop dependencies

ChatGPT has other failure modes. When a question doesn't have an answer written down somewhere, it really struggles. A case is something like "how do I write a parquet file in Java without using Hadoop".
This not at all trivial but quite possible[1], but ChatGPT will in 100% of the time either hallucinate APIs, disregard the instructions to not use Hadoop or give otherwise plausible but incorrect-looking answers.
The trick is that it isn't doable by simply finding the correct dependencies and API calls, you need extract and override filesystem classes from the Hadoop project to cut those ties.
[1] https://github.com/strategicblue/parquet-floor

Mumble

121 5,986 9.5 C++

Mumble is an open-source, low-latency, high quality voice chat software.
crawler-extension

5 19 6.1 JavaScript

A browser extension that can be installed by volunteers to participate in mwmbl distributed crawling.

Thank you for sharing this, this is very interesting. I will give it a try, although I don't think it can replace my current engine (DuckDuckGo/Searx), but rahter complement it maybe (by having a smaller, more curated set of data).
Particularly I am having a great time reading the crawler extension source-code: https://github.com/mwmbl/crawler-extension

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project