Web Crawler in Go: Extracting Keyword-Relevant Text with Text Density

This page summarizes the projects mentioned and recommended in the original post on reddit.com/r/golang

Our great sponsors
  • InfluxDB - Access the most powerful time series database as a service
  • SonarLint - Clean code begins in your IDE with SonarLint
  • SaaSHub - Software Alternatives and Reviews
  • TD-Spider

    Via Text Density Simple Web Crawler With Go

    git link

  • chromedp

    A faster, simpler way to drive browsers supporting the Chrome DevTools Protocol.

    There are a billion things that you need to consider when building a decent web crawler, especially interacting with pages in the modern web. For example, a lot of content is dynamically loaded by the browser nowadays, and won't show up if you make a simple HTTP request. Open your browser devtools and look at the network tab after you make a request, and you'll see it makes loads of auxiliary requests. Some content is also only loaded after you interact with it (e.g. hover, click). For that reason I'd recommend using something like chromedp and do browser based crawling, even if it's much slower.

  • InfluxDB

    Access the most powerful time series database as a service. Ingest, store, & analyze all types of time series data in a fully-managed, purpose-built database. Keep data forever with low-cost storage and superior data compression.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts