scrapy-sanoma-kuntavaalit2021
Fetch Sanoma kuntavaalit 2021 data [Moved to: https://github.com/raspi/scrapy-kuntavaalit2021-sanoma] (by raspi)
Spidey
A multi threaded web crawler library that is generic enough to allow different engines to be swapped in. (by JaCraig)
scrapy-sanoma-kuntavaalit2021 | Spidey | |
---|---|---|
1 | 2 | |
0 | 11 | |
- | - | |
4.1 | 9.5 | |
almost 3 years ago | 6 days ago | |
Python | C# | |
Apache License 2.0 | Apache License 2.0 |
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
scrapy-sanoma-kuntavaalit2021
Posts with mentions or reviews of scrapy-sanoma-kuntavaalit2021.
We have used some of these posts to build our list of alternatives
and similar projects. The last one was on 2021-05-28.
Spidey
Posts with mentions or reviews of Spidey.
We have used some of these posts to build our list of alternatives
and similar projects. The last one was on 2022-12-20.
-
I need data from a website. It is viable to create an API that scrapes the website and returns the data on an endpoint?
Didn't get a chance to reply earlier but depending on what you're trying to do, you might want a web crawler. I have a crawler on Github that I built for scraping in instances where someone doesn't have an API. If you go this route, I suggest doing it as a background task and go off cached data.
-
Recursion needed in small crawler
This may be overkill but I have library out there for building web crawlers. Spidey is the library. I'm not suggesting you use it but you could look at it for ideas. It uses a multithreaded, producer/consumer approach that avoids recursion and stack overflow issues. Use a queue, pull from the queue for each url, push new urls on when you find them. Do need to optimize my code a bit more but if it helps at all. But your issue is most likely the fact that you're finding a link to the page you are currently on. HashSet or List of found URLs would solve the issue.
What are some alternatives?
When comparing scrapy-sanoma-kuntavaalit2021 and Spidey you can also consider the following projects:
Photon - Incredibly fast crawler designed for OSINT.
scrapyrt - HTTP API for Scrapy spiders
scrapy-yle-kuntavaalit2021 - Fetch YLE kuntavaalit 2021 data
OpenWebCrawler - This is an open source Python web crawler which is meant to crawl the entire internet starting from a single URL, the goal of this project is to make an efficient, open source, powerful internet-scale web crawler which can be used in any applications and forked in any way as long as the forked project is also open source. Enjoy!