Pyppeteer Tutorial: The Ultimate Guide to Using Puppeteer with Python

This page summarizes the projects mentioned and recommended in the original post on dev.to

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • pyppeteer

    Headless chrome/chromium automation library (unofficial port of puppeteer)

  • The latest version of Pyppeteer, i.e., 1.0.2, can also be installed by executing pip3 install -U git+https://github.com/pyppeteer/pyppeteer@dev on the terminal.

  • puppeteer-sample

    Running test automation using Puppeteer and LambdaTest. Run Puppeteer tests in massive parallel in cloud at LambdaTest.

  • import asyncio import pytest from pyppeteer.errors import PageError from urllib.parse import quote import json import os import sys from os import environ from pyppeteer import connect, launch exec_platform = os.getenv('EXEC_PLATFORM') # Can take values - headless and non-headless chromium_version = os.getenv('CHROMIUM_VERSION') # Pytest fixture for browser setup @pytest.fixture(scope='function') async def browser(): if exec_platform == 'local': if chromium_version == '121': custom_chrome_path = "mac-chrome/Chromium_121.app/Contents/MacOS/Chromium" elif chromium_version == '113': custom_chrome_path = "mac-chrome/Chromium_113.app/Contents/MacOS/Chromium" else: custom_chrome_path = "mac-chrome/Chromium.app/Contents/MacOS/Chromium" browser = await launch(headless = False, executablePath = custom_chrome_path, args=['--start-maximized']) yield browser await asyncio.sleep(1) await browser.close() # Pytest fixture for page setup @pytest.fixture(scope='function') async def page(browser): page = await browser.newPage() yield page await page.close() # Ported code from https://github.com/LambdaTest/puppeteer-sample/blob/main/puppeteer-parallel.js @pytest.mark.asyncio async def test_exe_path(page): await page.goto('https://www.duckduckgo.com') await page.setViewport({'width': 1920, 'height': 1080}) element = await page.querySelector('[name="q"]') await element.click() await element.type('LambdaTest') await asyncio.gather( page.keyboard.press('Enter'), page.waitForNavigation() ) page_title = await page.title() try: assert page_title == 'LambdaTest at DuckDuckGo', 'Expected page title is incorrect!' await page.evaluate('_ => {}', f'lambdatest_action: {json.dumps({ "action": "setTestStatus", "arguments": { "status": "passed", "remark": "Title matched" } })}') except PageError as e: await page.evaluate('_ => {}', f'lambdatest_action: {json.dumps({ "action": "setTestStatus", "arguments": { "status": "failed", "remark": str(e) } })}')

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • puppeteer

    Node.js API for Chrome

  • # Define variables PYTHON := python3 POETRY := poetry PYTEST := pytest PIP := pip3 PROJECT_NAME := web automation with Pyppeteer .PHONY: install install: $(POETRY) install @echo "Dependency installation complete" $(PIP) install -r requirements.txt @echo "Set env vars LT_USERNAME & LT_ACCESS_KEY" # Procure Username and AccessKey from https://accounts.lambdatest.com/security export LT_USERNAME=himansh export LT_ACCESS_KEY=Ia1MiqNfci .PHONY: install poetry-install: poetry install .PHONY: test test: export NODE_ENV = test .PHONY: test pyunit-pyppeteer: - echo $(EXEC_PLATFORM) - $(PYTHON) tests/pyunit-pyppeteer/test_pyunit_pyppeteer.py .PHONY: test pytest-pyppeteer: - echo $(EXEC_PLATFORM) - $(PYTEST) --verbose --capture=no -s -n 2 tests/pytest-pyppeteer/test_pytest_pyppeteer_1.py \ tests/pytest-pyppeteer/test_pytest_pyppeteer_2.py .PHONY: test pyunit-pyppeteer-browser-session: - echo $(EXEC_PLATFORM) - $(PYTHON) tests/starting-browser-session/pyunit/test_pyppeteer_browser_session.py .PHONY: test pytest-pyppeteer-browser-session: - echo $(EXEC_PLATFORM) - $(PYTEST) --verbose --capture=no -s \ tests/starting-browser-session/pytest/test_pyppeteer_browser_session.py .PHONY: test asyncio-run-pyppeteer-browser-session: - echo $(EXEC_PLATFORM) - $(PYTHON) tests/starting-browser-session/asyncio_run/test_pyppeteer_browser_session.py .PHONY: test asyncio-run-complete-pyppeteer-browser-session: - echo $(EXEC_PLATFORM) - $(PYTHON) tests/starting-browser-session/\ asyncio_run_until_complete/test_pyppeteer_browser_session.py .PHONY: test pyppeteer-button-click: - echo $(EXEC_PLATFORM) - $(PYTEST) --verbose --capture=no -s tests/button-click/test_page_class_click.py .PHONY: test pyppeteer-activate-tab: - echo $(EXEC_PLATFORM) - $(PYTEST) --verbose --capture=no -s tests/active-tab/test_page_class_bringtofront.py ###### Testing Custom Environment - https://miyakogi.github.io/pyppeteer/reference.html#environment-variables # Available versions: 113, 121, and default .PHONY: test pyppeteer-custom-chromium-version: - echo $(EXEC_PLATFORM) - echo 'Browser Version:' $(CHROMIUM_VERSION) - $(PYTEST) --verbose --capture=no -s tests/custom-configuration/test_launcher_exe_path.py ###### Testing Headless - https://miyakogi.github.io/pyppeteer/reference.html#launcher # Available values: headless and non-headless .PHONY: test pyppeteer-custom-browser-mode: - echo $(EXEC_PLATFORM) - echo $(BROWSER_MODE) - $(PYTEST) --verbose --capture=no -s tests/custom-configuration/test_launcher_headless.py .PHONY: test pyppeteer-generate-pdf: - echo $(EXEC_PLATFORM) - $(PYTEST) --verbose --capture=no -s tests/generate-pdf/test_page_class_pdf.py .PHONY: test pyppeteer-generate-screenshot: - echo $(EXEC_PLATFORM) - $(PYTEST) --verbose --capture=no -s tests/generate-screenshots/test_page_class_screenshot.py .PHONY: test pyppeteer-cookies: - echo $(EXEC_PLATFORM) - $(PYTEST) --verbose --capture=no -s tests/handling-cookies/test_page_class_cookies.py .PHONY: test pyppeteer-dialog-box: - echo $(EXEC_PLATFORM) - $(PYTEST) --verbose --capture=no -s tests/handling-dialog-box/test_handling_dialog_box.py .PHONY: test pyppeteer-iframe: - echo $(EXEC_PLATFORM) - $(PYTEST) --verbose --capture=no -s tests/handling-iframe/test_page_class_iframe.py # Like Puppeteer, Navigation operations mentioned below only work in Headless mode # goBack: https://miyakogi.github.io/pyppeteer/reference.html#pyppeteer.page.Page.goBack # goForward: https://miyakogi.github.io/pyppeteer/reference.html#pyppeteer.page.Page.goForward # Bug Link # https://github.com/puppeteer/puppeteer/issues/7739 # https://stackoverflow.com/questions/65540674/how-to-error-check-pyppeteer-page-goback .PHONY: test pyppeteer-navigate-ops: - echo $(EXEC_PLATFORM) - $(PYTEST) --verbose --capture=no -s tests/navigate-operations/test_page_class_navigation_ops.py .PHONY: test pyppeteer-request-response: - echo $(EXEC_PLATFORM) - $(PYTEST) --verbose --capture=no -s tests/request-response/test_page_class_req_resp.py .PHONY: test pyppeteer-viewport: - echo $(EXEC_PLATFORM) - echo $(BROWSER_MODE) - $(PYTEST) --verbose --capture=no -s tests/setting-useragent-viewports/\ test_page_class_useragent_viewport.py::test_mod_viewport .PHONY: test pyppeteer-non-headless-useragent: - echo $(EXEC_PLATFORM) - echo $(BROWSER_MODE) - $(PYTEST) --verbose --capture=no -s tests/setting-useragent-viewports/\ test_page_class_useragent_viewport.py::test_get_nonheadless_user_agent .PHONY: test pyppeteer-headless-useragent: - echo $(EXEC_PLATFORM) - echo $(BROWSER_MODE) - $(PYTEST) --verbose --capture=no -s tests/setting-useragent-viewports/\ test_page_class_useragent_viewport.py::test_get_headless_user_agent .PHONY: test pyppeteer-dynamic-content: - echo $(EXEC_PLATFORM) - echo $(BROWSER_MODE) - $(PYTEST) --verbose --capture=no -s -n 4 tests/handling-dynamic-content/\ test_page_class_lazy_loaded_content.py .PHONY: test pyppeteer-web-scraping: - echo $(EXEC_PLATFORM) - $(PYTEST) --verbose --capture=no -s tests/web-scraping-content/\ test_scraping_with_pyppeteer.py .PHONY: clean clean: # This helped: https://gist.github.com/hbsdev/a17deea814bc10197285 find . | grep -E "(__pycache__|\.pyc$$)" | xargs rm -rf rm -rf .pytest_cache/ @echo "Clean Succeeded" .PHONY: distclean distclean: clean rm -rf venv .PHONY: help help: @echo "" @echo "install : Install project dependencies" @echo "clean : Clean up temp files" @echo "pyunit-pyppeteer : Running Pyppeteer tests with Pyunit framework" @echo "pytest-pyppeteer : Running Pyppeteer tests with Pytest framework" @echo "pyunit-pyppeteer-browser-session : Browser session using Pyppeteer and Pyunit" @echo "pytest-pyppeteer-browser-session : Browser session using Pyppeteer and Pytest" @echo "asyncio-run-pyppeteer-browser-session : Browser session using Pyppeteer (Approach 1)" @echo "asyncio-run-complete-pyppeteer-browser-session : Browser session using Pyppeteer (Approach 2)" @echo "pyppeteer-button-click : Button click demo using Pyppeteer" @echo "pyppeteer-activate-tab : Switching browser tabs using Pyppeteer" @echo "pyppeteer-custom-chromium-version : Custom Chromium version with Pyppeteer" @echo "pyppeteer-custom-browser-mode : Headless and non-headless test execution with Pyppeteer" @echo "pyppeteer-generate-pdf : Generating pdf using Pyppeteer" @echo "pyppeteer-generate-screenshot : Generating page & element screenshots with Pyppeteer" @echo "pyppeteer-cookies : Customizing cookies with Pyppeteer" @echo "pyppeteer-dialog-box : Handling Dialog boxes with Pyppeteer" @echo "pyppeteer-iframe : Handling iFrames with Pyppeteer" @echo "pyppeteer-navigate-ops : Back & Forward browser operations with Pyppeteer" @echo "pyppeteer-request-response : Request and Response demonstration using Pyppeteer" @echo "pyppeteer-viewport : Customizing viewports using Pyppeteer" @echo "pyppeteer-non-headless-useragent : Customizing user-agent (with browser in headed mode) using Pyppeteer" @echo "pyppeteer-headless-useragent : Customizing user-agent (with browser in headless mode) using Pyppeteer" @echo "pyppeteer-dynamic-content : Handling dynamic web content using Pyppeteer" @echo "pyppeteer-web-scraping : Dynamic web scraping using Pyppeteer"

  • web-scraping-with-python

    Demonstration of Web Scraping using Selenium Python (Pytest & Pyunit) and Beautiful Soup

  • import asyncio import pytest from pyppeteer.errors import PageError from urllib.parse import quote import json import os import sys from os import environ from pyppeteer import connect, launch exec_platform = os.getenv('EXEC_PLATFORM') # Get username and access key of the LambdaTest Platform username = environ.get('LT_USERNAME', None) access_key = environ.get('LT_ACCESS_KEY', None) test1_url = 'https://ecommerce-playground.lambdatest.io/' test2_url = 'https://scrapingclub.com/exercise/list_infinite_scroll/' # Usecase - 1 # loc_ecomm_1 = ".order-1.col-lg-6 div:nth-of-type(1) > div:nth-of-type(1) > div:nth-of-type(1) > div:nth-of-type(1) > div:nth-of-type(1) div:nth-of-type(1) > img:nth-of-type(1)" loc_ecomm_1 = "[aria-label='1 / 2'] div:nth-of-type(1) > [alt='Nikon D300']" target_url_1 = "https://ecommerce-playground.lambdatest.io/index.php?route=product/product&product_id=63" # Usecase - 2 (Click on e-commerce sliding banner) loc_ecomm_2 = "[alt='Canon DSLR camera']" target_url_2 = "https://ecommerce-playground.lambdatest.io/index.php?route=product/product&product_id=30" # Usecase - 3 Automating interactions on https://scrapingclub.com/exercise/list_infinite_scroll/ loc_infinite_src_prod1 = ".grid .p-4 [href='/exercise/list_basic_detail/93926-C/']" target_url_3 = "https://scrapingclub.com/exercise/list_basic_detail/93926-C/" # Usecase - 4 Automating interactions on https://scrapingclub.com/exercise/list_infinite_scroll/ # when the images are lazy loaded loc_infinite_src_prod2 = "div:nth-of-type(31) > .p-4 [href='/exercise/list_basic_detail/94967-A/']" target_url_4 = "https://scrapingclub.com/exercise/list_basic_detail/94967-A/" # Set timeout in ms timeOut = 60000 async def scroll_to_element(page, selector): # Scroll until the element is detected await page.evaluateHandle( '''async (selector) => { const element = document.querySelector(selector); if (element) { element.scrollIntoView(); } }''', selector ) return selector async def scroll_carousel(page, scr_count): for scr in range(1, scr_count): elem_next_button = "#mz-carousel-213240 > ul li:nth-child(" + str(scr) + ")" await asyncio.sleep(1) elem_next_button = await page.querySelector(elem_next_button) await elem_next_button.click() # Replica of https://github.com/hjsblogger/web-scraping-with-python/blob/ # main/tests/beautiful-soup/test_infinite_scraping.py#L67C5-L80C18 async def scroll_end_of_page(page): start_height = await page.evaluate('document.documentElement.scrollHeight') while True: # Scroll to the bottom of the page await page.evaluate(f'window.scrollTo(0, {start_height})') # Wait for the content to load await asyncio.sleep(1) # Get the new scroll height scroll_height = await page.evaluate('document.documentElement.scrollHeight') if scroll_height == start_height: # If heights are the same, we reached the end of the page break # Add an additional wait await asyncio.sleep(2) start_height = scroll_height # Additional wait after scrolling await asyncio.sleep(2) @pytest.mark.asyncio @pytest.mark.order(1) async def test_lazy_load_ecomm_1(page): # The time out can be set using the setDefaultNavigationTimeout # It is primarily used for overriding the default page timeout of 30 seconds page.setDefaultNavigationTimeout(timeOut) await page.goto(test1_url, {'waitUntil': 'load', 'timeout': timeOut}) # Set the viewport - Apple MacBook Air 13-inch # Reference - https://codekbyte.com/devices-viewport-sizes/ # await page.setViewport({'width': 1440, 'height': 770}) await asyncio.sleep(2) if exec_platform == 'local': # Scroll until the element is detected elem_button = await scroll_to_element(page, loc_ecomm_1) # await page.click(elem_button) # Wait until the page is loaded # https://miyakogi.github.io/pyppeteer/reference.html#pyppeteer.page.Page.waitForNavigation navigationPromise = asyncio.ensure_future(page.waitForNavigation()) await page.click(elem_button) await navigationPromise elif exec_platform == 'cloud': elem_button = await page.waitForSelector(loc_ecomm_1, {'visible': True}) await asyncio.gather( elem_button.click(), page.waitForNavigation({'waitUntil': 'networkidle2', 'timeout': 30000}), ) # Assert if required, since the test is a simple one; we leave as is :D current_url = page.url print('Current URL is: ' + current_url) try: assert current_url == target_url_1 print("Test Success: Product checkout successful") except PageError as e: print("Test Failure: Could not checkout Product") print("Error Code" + str(e)) @pytest.mark.asyncio @pytest.mark.order(2) async def test_lazy_load_ecomm_2(page): carousel_len = 4 # The time out can be set using the setDefaultNavigationTimeout # It is primarily used for overriding the default page timeout of 30 seconds page.setDefaultNavigationTimeout(timeOut) await page.goto(test1_url, {'waitUntil': 'load', 'timeout': timeOut}) # Set the viewport - Apple MacBook Air 13-inch # Reference - https://codekbyte.com/devices-viewport-sizes/ # await page.setViewport({'width': 1440, 'height': 770}) await asyncio.sleep(2) # Approach 1: Directly click on the third button on the carousel # elem_carousel_banner = await page.querySelector("#mz-carousel-213240 > ul li:nth-child(3)") # await asyncio.sleep(1) # await elem_carousel_banner.click() # Approach 2 (Only for demo): Serially click on every button on carousel await scroll_carousel(page, carousel_len) await asyncio.sleep(1) # elem_prod_1 = await page.querySelector(loc_ecomm_2) elem_prod_1 = await page.waitForSelector(loc_ecomm_2, {'visible': True}) await asyncio.gather( elem_prod_1.click(), page.waitForNavigation({'waitUntil': 'networkidle2', 'timeout': 60000}), ) # Assert if required, since the test is a simple one; we leave as is :D current_url = page.url print('Current URL is: ' + current_url) try: assert current_url == target_url_2 print("Test Success: Product checkout successful") except PageError as e: print("Test Failure: Could not checkout Product") print("Error Code" + str(e)) @pytest.mark.asyncio @pytest.mark.order(3) async def test_lazy_load_infinite_scroll_1(page): # The time out can be set using the setDefaultNavigationTimeout # It is primarily used for overriding the default page timeout of 30 seconds page.setDefaultNavigationTimeout(timeOut) await page.goto(test2_url, {'waitUntil': 'load', 'timeout': timeOut}) # Set the viewport - Apple MacBook Air 13-inch # Reference - https://codekbyte.com/devices-viewport-sizes/ # await page.setViewport({'width': 1440, 'height': 770}) await asyncio.sleep(1) elem_prod1 = await page.querySelector(loc_infinite_src_prod1) await asyncio.gather( elem_prod1.click(), page.waitForNavigation({'waitUntil': 'networkidle2', 'timeout': 60000}), ) # await asyncio.sleep(1) # await elem_carousel_banner.click() # elem_button = scroll_to_element(page, loc_infinite_src_prod1) # print(elem_button) # await asyncio.sleep(2) # await elem_button.click() # Assert if required, since the test is a simple one; we leave as is :D current_url = page.url print('Current URL is: ' + current_url) try: assert current_url == target_url_3 print("Test Success: Product checkout successful") except PageError as e: print("Test Failure: Could not checkout Product") print("Error Code" + str(e)) @pytest.mark.asyncio @pytest.mark.order(4) async def test_lazy_load_infinite_scroll_2(page): # The time out can be set using the setDefaultNavigationTimeout # It is primarily used for overriding the default page timeout of 30 seconds page.setDefaultNavigationTimeout(timeOut) # Tested navigation using LambdaTest YouTube channel # await page.goto("https://www.youtube.com/@LambdaTest/videos", await page.goto(test2_url, {'waitUntil': 'load', 'timeout': timeOut}) # Set the viewport - Apple MacBook Air 13-inch # Reference - https://codekbyte.com/devices-viewport-sizes/ # await page.setViewport({'width': 1440, 'height': 770}) await asyncio.sleep(1) await scroll_end_of_page(page) await page.evaluate('window.scrollTo(0, 0)') await asyncio.sleep(1) # elem_prod = await page.querySelector(loc_infinite_src_prod2) # asyncio.sleep(1) # await asyncio.gather( # elem_prod.click(), # page.waitForNavigation({'waitUntil': 'load', 'timeout': 60000}), # ) elem_button = await scroll_to_element(page, loc_infinite_src_prod2) await asyncio.sleep(1) # await page.click(elem_button) await asyncio.gather( page.click(elem_button), page.waitForNavigation({'waitUntil': 'networkidle2', 'timeout': 60000}), ) # Assert if required, since the test is a simple one; we leave as is :D current_url = page.url print('Current URL is: ' + current_url) try: assert current_url == target_url_4 print("Test Success: Product checkout successful") except PageError as e: print("Test Failure: Could not checkout Product") print("Error Code" + str(e))

  • NodeJs-Cucumber-Selenium

    Run test automation on cloud with Cucumber.js and LambdaTest. This is a sample repo to help you execute Cucumber.js framework based test scripts in parallel with LambdaTest automation testing cloud

  • import asyncio import pytest from pyppeteer.errors import PageError from urllib.parse import quote import json import os import sys from os import environ from pyppeteer import connect, launch # Documentation Link # https://miyakogi.github.io/pyppeteer/reference.html#dialog-class # Interesting reference question # https://stackoverflow.com/questions/75622322/ # why-dialog-popup-alert-doesnt-dismiss-as-they-claim-by-using-pyppeteer-class # Scenario Handling Alerts dialog_test_url = 'https://www.lambdatest.com/selenium-playground/javascript-alert-box-demo' # Locators for different elements # 0 - JS Alert # 1 - Confirm Box # 2 - Prompt Box loc_alert_arr = ['.my-30', '.py-20.ml-10 .btn', 'section:nth-of-type(3) div:nth-of-type(3) .btn'] test_message = 'LambdaTest is a great platform!' # Event event listener for handling JS alert dialog box def handle_js_dialog_box(dialog): asyncio.ensure_future(dialog.accept()) print(f"Dialog message: {dialog.message}") # Event event listener for handling confirm dialog box def handle_confirm_accept_dialog_box(dialog): asyncio.ensure_future(dialog.accept()) print(f"Dialog message: {dialog.message}") # Event event listener for handling confirm dialog box def handle_confirm_dismiss_dialog_box(dialog): asyncio.ensure_future(dialog.dismiss()) print(f"Dialog message: {dialog.message}") # Event event listener for handling prompt dialog box def handle_confirm_prompt_dialog_box(dialog): asyncio.ensure_future(dialog.accept(test_message)) print(f"Dialog message: {dialog.message}") @pytest.mark.asyncio @pytest.mark.order(1) async def test_handling_js_alerts(page): await page.goto(dialog_test_url) # Can be changed with non-blocking sleep await asyncio.sleep(1) page.on('dialog', handle_js_dialog_box) elem_alert = await page.querySelector(loc_alert_arr[0]) # Click on the located element await elem_alert.click() # Wait for the event loop to process events await asyncio.sleep(2) @pytest.mark.asyncio @pytest.mark.order(2) async def test_handling_confirm_accept_alerts(page): await page.goto(dialog_test_url) # Can be changed with non-blocking sleep await asyncio.sleep(1) page.on('dialog', handle_confirm_accept_dialog_box) # Confirm Alert elem_alert = await page.querySelector(loc_alert_arr[1]) # Click on the located element await elem_alert.click() # Wait for the event loop to process events await asyncio.sleep(2) @pytest.mark.asyncio @pytest.mark.order(3) async def test_handling_confirm_dismiss_alerts(page): await page.goto(dialog_test_url) # Can be changed with non-blocking sleep await asyncio.sleep(1) page.on('dialog', handle_confirm_dismiss_dialog_box) await asyncio.sleep(2) # Dismiss Alert elem_alert = await page.querySelector(loc_alert_arr[1]) # Click on the located element await elem_alert.click() # Wait for the event loop to process events await asyncio.sleep(2) @pytest.mark.asyncio @pytest.mark.order(4) async def test_handling_prompt_alerts(page): await page.goto(dialog_test_url) # Can be changed with non-blocking sleep await asyncio.sleep(1) page.on('dialog', handle_confirm_prompt_dialog_box) await asyncio.sleep(1) # Prompt Alert elem_alert = await page.querySelector(loc_alert_arr[2]) # Click on the located element await elem_alert.click() page.on('dialog', handle_confirm_dismiss_dialog_box) await asyncio.sleep(2)

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts