Top 3 HTML Distributed Projects
-
scaling-to-distributed-crawling
Repository for the Mastering Web Scraping in Python: Scaling to Distributed Crawling blogpost with the final code.
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
PeARS-orchard
This is the development version of PeARS, the people's search engine. More compact but less robust than PeARS-lite. If you just want to use PeARS as a local indexer, use PeARS-lite instead.
Project mention: Welcome to mwmbl, the free, open-source and non-profit search engine | news.ycombinator.com | 2023-09-18> We now have a distributed crawler that runs on our volunteers' machines! If you have Firefox you can help out by installing our extension.
This is a very interesting idea that other search engines have tried before. Actually, the Brave search engine is built over Cliqz[6] that implemented this same idea but *without* the user's consent.
Copy pasting from an old comment I made about this "human web" crawler idea:
Both PeARS[1] and Cliqz[2] tried to do that. Both got direct support from Mozilla[3][4] but it looks like neither really kicked off.
PeARS was meant to be installed voluntarily by users who would then choose to share their indexes only to those they personally trusted, so the idea is very privacy conscious but also very hard to scale.
Cliqz, on the other hand, apparently tried to work around that issue by having their add-on bundled by default in some Firefox installations[5] which was obviously very controversial because of its privacy and user consent implications.
I still think the idea has potential, though, even if it's in a more limited scope.
[1] https://github.com/PeARSearch/PeARS-orchard
[2] https://cliqz.com/en/whycliqz/human-web
[3] https://blog.mozilla.org/press-uk/2016/06/22/mozilla-gives-3...
[4] https://blog.mozilla.org/press-uk/2016/08/23/mozilla-makes-s...
[5] https://www.zdnet.com/article/firefox-tests-cliqz-engine-whi...
[6] https://www.theregister.com/2021/03/03/brave_buys_a_search_e...
HTML Distributed related posts
Index
What are some of the best open-source Distributed projects in HTML? This list will help you:
Project | Stars | |
---|---|---|
1 | storm-crawler | 855 |
2 | scaling-to-distributed-crawling | 36 |
3 | PeARS-orchard | 35 |
Sponsored