burplist
scrapy-playwright
burplist | scrapy-playwright | |
---|---|---|
5 | 11 | |
11 | 837 | |
- | 3.1% | |
6.8 | 7.8 | |
26 days ago | 3 months ago | |
Python | Python | |
MIT License | BSD 3-clause "New" or "Revised" License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
burplist
-
Say Goodbye to Heroku Free Tier: Here Are 4 Alternatives
Heroku Dynos apps — Burplist was migrated to Koyeb. Having tried out other notable Heroku alternatives like Fly.io, Northflank, and Railway, I can safely say that the migration from Heroku to Koyeb required the least amount of effort and kinks. It just works in my case.
-
Saying Goodbye to Heroku Postgres for Now
A little less than a year ago, I built Burplist, a free search engine for craft beer in Singapore. With the goal to keep my infrastructure cost as low as possible, I started off with Heroku Postgres free tier.
-
I Built a Craft Beer Search Engine for Free
The link is to an article about how the search engine was built. The search engine itself can be found here:
https://burplist.me/
You should mention that this is only for Singapore. Nothing wrong with that, just that HN is predomonantly US and if someone goes to https://burplist.me/ searching US craft beers they are going to be a bit unhappy.
-
How I Built A Craft Beer Search Engine For Free
Rather than slamming in code examples in this post, I write about a high-level overview of how and why things are done in such a manner. So, you may find links to articles in different sections of this post on the know-how of how some steps were achieved, including the source code.
scrapy-playwright
-
Web Scraping Dynamic Websites With Scrapy Playwright
scrapy-playwright is an integration between Scrapy and Playwright. It enables scraping dynamic web pages with Scrapy by processing the web scraping requests using a Playwright instance.
- Turning webpages into pdf
- Scrapy & splash guide
-
Web scraping with Python
To integrate Playwright with Scrapy, we will use the scrapy-playwright library. Then, we will scrape https://www.mintmobile.com/product/google-pixel-7-pro-bundle/ to demonstrate how to extract data from a website using Playwright and Scrapy.
-
which libraries/frameworks could be used for page interaction?
Scrapy-playwright
-
Implementing a Selenium backend on a web app?
your website is a dynamic there is many integration on scrappy can help you This the best best one https://github.com/scrapy-plugins/scrapy-playwright
-
Is Selenium still a good choice?
This concern should be lifted if you are a Scrapy lover. There is a Scrapy integration for playwright, that gives you a lot of freedom and lets you operate from a Scrapy spider.
-
Scraping Dynamic Javascript Websites with Scrapy and Scrapy-playwright
Now we need to modify scrapy's settings to allow it to work with playwright. Instructions can be found on playwright's GitHub page. We need to add settings for DOWNLOAD_HANDLERS and TWISTED_REACTOR. New settings that were added can be found between ###. This is what the settings file should look like:
-
Web Scraping with Python: Everything you need to know
You can use something like scrapy-playwright[0] to run a headless browser framework as your download handler. I think there are versions for some of the other headless systems, if you prefer those.
[0] https://github.com/scrapy-plugins/scrapy-playwright
-
Make an addition to scrapy_playwright source code
[1]: https://github.com/scrapy-plugins/scrapy-playwright/issues/61
What are some alternatives?
Flask-Migrate - SQLAlchemy database migrations for Flask applications using Alembic
scrapy-splash - Scrapy+Splash for JavaScript integration
open-gov-crawlers - Parse government documents into well formed JSON
scrapy-cloudflare-middleware - A Scrapy middleware to bypass the CloudFlare's anti-bot protection
Scrapy - Scrapy, a fast high-level web crawling & scraping framework for Python.
scrapy-rotating-proxies - use multiple proxies with Scrapy
scrapy-fake-useragent - Random User-Agent middleware based on fake-useragent
ArchiveBox - 🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...
aiopath - 📁 Asynchronous pathlib for Python
scrapy-inline-requests - A decorator to write coroutine-like spider callbacks.
yt-videos-list - Create and **automatically** update a list of all videos on a YouTube channel (in txt/csv/md form) via YouTube bot with end-to-end web scraping - no API tokens required. Multi-threaded support for YouTube videos list updates.