How do I scrape LinkedIn w/o getting blocked?

This page summarizes the projects mentioned and recommended in the original post on /r/learnpython

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • undetected-chromedriver

    Custom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation systems (like Distil / Imperva/ Datadadome / CloudFlare IUAM)

  • There are some custom chrome drivers out there that try to be undetectable, have a look at https://github.com/ultrafunkamsterdam/undetected-chromedriver for example. Each website has different techniques to detect bots. In general you can say that it's a good idea to change your User Agent on each request, always wait a couple of Seconds after a request. And additionally if that is an Option for you: don't use headless. Some detection systems specifically aim at identifiying headless browsers by rendering some ui element and checking the size of it or something similar.

  • Scrapy

    Scrapy, a fast high-level web crawling & scraping framework for Python.

  • Also, if scraping is all you need to do, there are better tools than Selenium. I suggest looking into Scrapy.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts