HTML Distributed

Open-source HTML projects categorized as Distributed

Top 3 HTML Distributed Projects

  • storm-crawler

    A scalable, mature and versatile web crawler based on Apache Storm

  • scaling-to-distributed-crawling

    Repository for the Mastering Web Scraping in Python: Scaling to Distributed Crawling blogpost with the final code.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • PeARS-orchard

    This is the development version of PeARS, the people's search engine. More compact but less robust than PeARS-lite. If you just want to use PeARS as a local indexer, use PeARS-lite instead.

  • Project mention: Welcome to mwmbl, the free, open-source and non-profit search engine | news.ycombinator.com | 2023-09-18

    > We now have a distributed crawler that runs on our volunteers' machines! If you have Firefox you can help out by installing our extension.

    This is a very interesting idea that other search engines have tried before. Actually, the Brave search engine is built over Cliqz[6] that implemented this same idea but *without* the user's consent.

    Copy pasting from an old comment I made about this "human web" crawler idea:

    Both PeARS[1] and Cliqz[2] tried to do that. Both got direct support from Mozilla[3][4] but it looks like neither really kicked off.

    PeARS was meant to be installed voluntarily by users who would then choose to share their indexes only to those they personally trusted, so the idea is very privacy conscious but also very hard to scale.

    Cliqz, on the other hand, apparently tried to work around that issue by having their add-on bundled by default in some Firefox installations[5] which was obviously very controversial because of its privacy and user consent implications.

    I still think the idea has potential, though, even if it's in a more limited scope.

    [1] https://github.com/PeARSearch/PeARS-orchard

    [2] https://cliqz.com/en/whycliqz/human-web

    [3] https://blog.mozilla.org/press-uk/2016/06/22/mozilla-gives-3...

    [4] https://blog.mozilla.org/press-uk/2016/08/23/mozilla-makes-s...

    [5] https://www.zdnet.com/article/firefox-tests-cliqz-engine-whi...

    [6] https://www.theregister.com/2021/03/03/brave_buys_a_search_e...

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

HTML Distributed related posts

Index

What are some of the best open-source Distributed projects in HTML? This list will help you:

Project Stars
1 storm-crawler 855
2 scaling-to-distributed-crawling 36
3 PeARS-orchard 35

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com