scrapy-crawl-once
Scrapy middleware which allows to crawl only new content (by TeamHG-Memex)
scrapy-splash
Scrapy+Splash for JavaScript integration (by scrapy-plugins)
scrapy-crawl-once | scrapy-splash | |
---|---|---|
1 | 3 | |
80 | 3,193 | |
- | 1.3% | |
0.0 | 7.0 | |
over 2 years ago | about 1 month ago | |
Python | Python | |
MIT License | BSD 3-clause "New" or "Revised" License |
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
scrapy-crawl-once
Posts with mentions or reviews of scrapy-crawl-once.
We have used some of these posts to build our list of alternatives
and similar projects.
-
Skip Seen URLS
You should use https://github.com/TeamHG-Memex/scrapy-crawl-once or even adapt it to your DB.
scrapy-splash
Posts with mentions or reviews of scrapy-splash.
We have used some of these posts to build our list of alternatives
and similar projects. The last one was on 2021-05-10.
-
Scrape with Splash Requests returns empty
I have also modified the settings.py from according to steps 1-5 from https://github.com/scrapy-plugins/scrapy-splash
-
Anybody actually hoard something they weren't able to find later on the internet?
To add to u/nemec, here are the docs for scrapy splash which I’ve used several times (and just requires you to spin up their docker container to get started): https://github.com/scrapy-plugins/scrapy-splash
-
How Do I Scrape Data From A Scrollable List That
Your best bet is scrapy splash as you're dealing with dynamically generated html: https://github.com/scrapy-plugins/scrapy-splash
What are some alternatives?
When comparing scrapy-crawl-once and scrapy-splash you can also consider the following projects:
Gerapy - Distributed Crawler Management Framework Based on Scrapy, Scrapyd, Django and Vue.js
scrapy-playwright - 🎭 Playwright integration for Scrapy
scrapy-rotating-proxies - use multiple proxies with Scrapy
scrapy-cloudflare-middleware - A Scrapy middleware to bypass the CloudFlare's anti-bot protection
scrapydweb - Web app for Scrapyd cluster management, Scrapy log analysis & visualization, Auto packaging, Timer tasks, Monitor & Alert, and Mobile UI. Docs 文档 :point_right:
scrapy-fake-useragent - Random User-Agent middleware based on fake-useragent