A next-gen crawling and spidering framework

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • InfluxDB - Access the most powerful time series database as a service
  • Sonar - Write Clean Python Code. Always.
  • SaaSHub - Software Alternatives and Reviews
  • katana

    A next-generation crawling and spidering framework.

  • cloudscraper

    A Python module to bypass Cloudflare's anti-bot page.

    If you're scraping with Python, try cloudscraper—among other things(!), it supports JS rendering (basically the bare-minimum check cloudflare does), without needing to run a full browser in the background. It's built on requests, so integration (for me, anyway) was pretty easy.

    https://github.com/venomous/cloudscraper

  • InfluxDB

    Access the most powerful time series database as a service. Ingest, store, & analyze all types of time series data in a fully-managed, purpose-built database. Keep data forever with low-cost storage and superior data compression.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts