Our great sponsors
-
undetected-chromedriver
Custom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation systems (like Distil / Imperva/ Datadadome / CloudFlare IUAM)
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
There are some custom chrome drivers out there that try to be undetectable, have a look at https://github.com/ultrafunkamsterdam/undetected-chromedriver for example. Each website has different techniques to detect bots. In general you can say that it's a good idea to change your User Agent on each request, always wait a couple of Seconds after a request. And additionally if that is an Option for you: don't use headless. Some detection systems specifically aim at identifiying headless browsers by rendering some ui element and checking the size of it or something similar.
Also, if scraping is all you need to do, there are better tools than Selenium. I suggest looking into Scrapy.
Related posts
- Scrapy: A Fast and Powerful Scraping and Web Crawling Framework
- Implementing case sensitive headers in Scrapy (not through `_caseMappings`)
- Dicas para projetos usando web scraping
- Best tools to use for web scraping ??
- I'm using python to scrape web page content and extract keywords, how can I make it faster to process?