InfluxDB is the Time Series Platform where developers build real-time applications for analytics, IoT and cloud-native services. Easy to start, it is available in the cloud or on-premises. Learn more →
Top 23 Python Crawler Projects
-
I wanted to invest my time and energy in learning the fastest, most efficient one, that can scale with my as my projects get more and more complex scrapy. After all, I want my projects to shine so bright in my cv it blinds the recruiter's eyes.
-
-
Sonar
Write Clean Python Code. Always.. Sonar helps you commit clean code every time. With over 225 unique rules to find Python bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work.
-
-
-
Project mention: Ask HN: What are the best tools for web scraping in 2022? | news.ycombinator.com | 2022-08-10
11. With some work, you can use Scrapy for distributed projects that are scraping thousands (millions) of domains. We are using https://github.com/rmax/scrapy-redis.
-
Project mention: A Smart, Automatic, Fast and Lightweight Web Scraper for Python | reddit.com/r/webdev | 2022-12-02
-
-
InfluxDB
Build time-series-based applications quickly and at scale.. InfluxDB is the Time Series Platform where developers build real-time applications for analytics, IoT and cloud-native services. Easy to start, it is available in the cloud or on-premises.
-
-
-
-
-
news-please
news-please - an integrated web crawler and information extractor for news that just works
Look at news-please, you can find it in GitHub. I did something similar and it was very helpful. You can hit me up with if you have any questions.
-
-
grab-site
The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns
After a lot of searching for a similar topic, this is a tool I found which works pretty well: https://github.com/ArchiveTeam/grab-site
-
-
-
Project mention: New to python and scrapy stuff but need this project to work so that I can do my data research and stuff easily in the future. | reddit.com/r/scrapy | 2022-04-19
-
trafilatura
Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
Project mention: Testing fast installation in tear-down environment | reddit.com/r/learnpython | 2022-07-06I want to test how easy it is to install a package plus special extra dependencies to run a certain script in that package: https://github.com/adbar/trafilatura
-
Project mention: Can chat GPT overtake Google if they play their cards right? | reddit.com/r/Futurology | 2022-12-23
-
Project mention: Dotcommon – common aliases and plugins on most updated GitHub repositories | news.ycombinator.com | 2022-08-13
-
google-play-scraper
Google play scraper for Python inspired by <facundoolano/google-play-scraper> (by JoMingyu)
Project mention: Report: Analysis of 2.9 millions apps on Google Play | reddit.com/r/androiddev | 2022-11-08Its easy. python library: google-play-scraper.
-
freshonions-torscraper
Fresh Onions is an open source TOR spider / hidden service onion crawler hosted at zlal32teyptf4tvi.onion
-
Moodle-Downloader-2
A Moodle downloader that downloads course content fast from Moodle (eg. lecture pdfs)
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Python Crawler related posts
- I need data from a website. It is viable to create an API that scrapes the website and returns the data on an endpoint?
- Can chat GPT overtake Google if they play their cards right?
- A Smart, Automatic, Fast and Lightweight Web Scraper for Python
- A next-gen crawling and spidering framework
- Report: Analysis of 2.9 millions apps on Google Play
- What else could be built using the LBRY protocol?
- Dotcommon – common aliases and plugins on most updated GitHub repositories
-
A note from our sponsor - InfluxDB
www.influxdata.com | 28 Jan 2023
Index
What are some of the best open-source Crawler projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | Scrapy | 45,702 |
2 | pyspider | 15,713 |
3 | newspaper | 12,386 |
4 | Photon | 9,334 |
5 | scrapy-redis | 5,227 |
6 | autoscraper | 4,873 |
7 | toapi | 3,356 |
8 | Grab | 2,257 |
9 | weibo-crawler | 2,159 |
10 | TorBot | 1,805 |
11 | PSpider | 1,732 |
12 | news-please | 1,489 |
13 | OpenWPM | 1,248 |
14 | grab-site | 959 |
15 | mlscraper | 815 |
16 | XSRFProbe | 791 |
17 | scrapyrt | 758 |
18 | trafilatura | 730 |
19 | bookcorpus | 594 |
20 | dotcommon | 591 |
21 | google-play-scraper | 490 |
22 | freshonions-torscraper | 439 |
23 | Moodle-Downloader-2 | 309 |