Scrape Google Ad Results with Python

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

requests

87 51,359 8.4 Python

A simple, yet elegant, HTTP library.

import requests, lxml, urllib.parse from bs4 import BeautifulSoup # Adding User-agent (default user-agent from requests library is 'python-requests') # https://github.com/psf/requests/blob/589c4547338b592b1fb77c65663d8aa6fbb7e38b/requests/utils.py#L808-L814 headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3538.102 Safari/537.36 Edge/18.19582" } # Search query params = {'q': 'сoffee buy'} # Getting HTML response html = requests.get(f'https://www.google.com/search?q=', headers=headers, params=params).text # Getting HTML code from BeautifulSoup soup = BeautifulSoup(html, 'lxml') # Looking for container that has all necessary data findAll() or find_all() for container in soup.findAll('div', class_='RnJeZd top pla-unit-title'): # Scraping title title = container.text # Creating beginning of the link to join afterwards startOfLink = 'https://www.googleadservices.com/pagead' # Scraping end of the link to join afterwards endOfLink = container.find('a')['href'] # Combining (joining) relative and absolute URL's (adding begining and end link) ad_link = urllib.parse.urljoin(startOfLink, endOfLink) # Printing each title and link on a new line print(f'{title}\n{ad_link}\n') # Output ''' Jot Ultra Coffee Triple | Ultra Concentrated https://www.googleadservices.com/aclk?sa=l&ai=DChcSEwiP0dmfvcbwAhX48OMHHYyRBuoYABABGgJ5bQ&sig=AOD64_0x-PlrWek-JFlDTSo7E9Z7YhUOjg&ctype=5&q=&ved=2ahUKEwjhr9GfvcbwAhXHQs0KHQCbCAUQww96BAgCED4&adurl= MUD\WTR | A Healthier Coffee Alternative, 30 servings https://www.googleadservices.com/aclk?sa=l&ai=DChcSEwiP0dmfvcbwAhX48OMHHYyRBuoYABAJGgJ5bQ&sig=AOD64_3gltZJ6kPrxic5o8yUO5cuJrHXnw&ctype=5&q=&ved=2ahUKEwjhr9GfvcbwAhXHQs0KHQCbCAUQww96BAgCEEg&adurl= Jot Ultra Coffee Double | 2 bottles = 28 cups https://www.googleadservices.com/aclk?sa=l&ai=DChcSEwiP0dmfvcbwAhX48OMHHYyRBuoYABAHGgJ5bQ&sig=AOD64_3hD0JWZSLr8NUgoTW5K0HMzdFvng&ctype=5&q=&ved=2ahUKEwjhr9GfvcbwAhXHQs0KHQCbCAUQww96BAgCEE4&adurl= '''

pyppeteer

17 3,418 5.6 Python

Headless chrome/chromium automation library (unofficial port of puppeteer)

using headless browser or browser automation frameworks, such as * selenium or pyppeteer.

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project