Our great sponsors
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
In general celery tasks should be idempotent if possible, for scraping consider if Scrapy might not be more appropriate, it already implements a lot of the rate limiting/retrying you have to replicate in celery yourself. But regarding locking you are right to consider databases/redis since celery workers might run on entirely different machines even. In the case of a paginated scrape with celery, you could schedule another task from within a task itself. To adhere to rate limits you could spawn a delayed task.
NOTE:
The number of mentions on this list indicates mentions on common posts plus user suggested alternatives.
Hence, a higher number means a more popular project.
Related posts
- Scrapy: A Fast and Powerful Scraping and Web Crawling Framework
- Implementing case sensitive headers in Scrapy (not through `_caseMappings`)
- Dicas para projetos usando web scraping
- Best tools to use for web scraping ??
- I'm using python to scrape web page content and extract keywords, how can I make it faster to process?