Our great sponsors
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
git link
There are a billion things that you need to consider when building a decent web crawler, especially interacting with pages in the modern web. For example, a lot of content is dynamically loaded by the browser nowadays, and won't show up if you make a simple HTTP request. Open your browser devtools and look at the network tab after you make a request, and you'll see it makes loads of auxiliary requests. Some content is also only loaded after you interact with it (e.g. hover, click). For that reason I'd recommend using something like chromedp and do browser based crawling, even if it's much slower.