Web Crawler in Go: Extracting Keyword-Relevant Text with Text Density

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

TD-Spider

6 13 1.4 Go

Via Text Density Simple Web Crawler With Go

git link

chromedp

27 10,341 5.5 Go

A faster, simpler way to drive browsers supporting the Chrome DevTools Protocol.

There are a billion things that you need to consider when building a decent web crawler, especially interacting with pages in the modern web. For example, a lot of content is dynamically loaded by the browser nowadays, and won't show up if you make a simple HTTP request. Open your browser devtools and look at the network tab after you make a request, and you'll see it makes loads of auxiliary requests. Some content is also only loaded after you interact with it (e.g. hover, click). For that reason I'd recommend using something like chromedp and do browser based crawling, even if it's much slower.

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project