lambdasoup
pyppeteer
Our great sponsors
lambdasoup | pyppeteer | |
---|---|---|
2 | 10 | |
322 | 2,444 | |
- | 3.8% | |
4.1 | 6.1 | |
5 months ago | 15 days ago | |
OCaml | Python | |
MIT License | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
lambdasoup
-
The State of Web Scraping in 2021
OCaml’s Lambda Soup (https://aantron.github.io/lambdasoup/) is a amazing library/, especially for those that prefer functional programming
-
Soupault (soup-oh) is a tool that helps you create and manage static websites
It's used for sorting "widgets" (page processing steps) according to dependency lists that users can specify in the config (like `after = ["foo", "bar"]`).
Other than that, one thing I really like about OCaml is that the compiler team and most library maintainers are considerate towards downstream users with respect to compatibility.
The Lua interpreter [3] that soupault uses for its plugin API is a revived 20 year old research project. It only needed minor modifications to build with recent compiler versions.
pyppeteer
-
Trying to find a way to automate button clicking on work program without image use
The normal Puppeteer package is JavaScript, but I do see that there's a Python port called pyppeteer. I can't vouch for it specifically, but I imagine it's similarly easy to use as the JS version.
- Scrape JSON from Network Traffic using Selenium
- How to start Web scraping with python?
-
PyAutoGUI with CSS Selector
If you're talking about CSS I reckon you want to click/input things on a website inside your browser. In this case you would use a web driver which can automate a web browser like Chrome or Firefox. Something like Helium, Selenium or pyppeteer.
-
The State of Web Scraping in 2021
In my own experience puppeteer is much better/capable than selenium but the problem is that puppeteer requires nodejs. its python-wrapper https://github.com/pyppeteer/pyppeteer was not as good as selenium when you like to use python.
Pyppetteer is feature complete and worth noting: https://github.com/pyppeteer/pyppeteer
-
Scraping data from interative web charts python
For complex pages I usually use Puppeteer (from Google). A Python port is here: https://github.com/pyppeteer/pyppeteer but that's not widely used as the official JavaScript Version.
-
Scrape Google Ad Results with Python
using headless browser or browser automation frameworks, such as * selenium or pyppeteer.
- Web Scraping 101 with Python
-
Beautiful soup
If JavaScript is involved then things get ugly quickly. The best approach usually is a headless browser. There are several options, pyppeteer is the one I currently use.
What are some alternatives?
puppeteer - Headless Chrome Node.js API
Scrapy - Scrapy, a fast high-level web crawling & scraping framework for Python.
playwright-python - Python version of the Playwright testing and automation library.
selenium-python-helium - Selenium-python but lighter: Helium is the best Python library for web automation.
requests - A simple, yet elegant, HTTP library.
Playwright - Playwright is a framework for Web Testing and Automation. It allows testing Chromium, Firefox and WebKit with a single API.
scraper - Nodejs web scraper. Contains a command line, docker container, terraform module and ansible roles for distributed cloud scraping. Supported databases: SQLite, MySQL, PostgreSQL. Supported headless clients: Puppeteer, Playwright, Cheerio, JSdom.
utls - Fork of the Go standard TLS library, providing low-level access to the ClientHello for mimicry purposes.
scraper - A scraper for EmulationStation written in Go using hashing
selectolax - Python binding to Modest and Lexbor engines (fast HTML5 parser with CSS selectors).
colly - Elegant Scraper and Crawler Framework for Golang
otoml - TOML parsing, manipulation, and pretty-printing library for OCaml (fully 1.0.0-compliant)