Python Web Crawling

Open-source Python projects categorized as Web Crawling

Top 22 Python Web Crawling Projects

  • Scrapy

    Scrapy, a fast high-level web crawling & scraping framework for Python.

    Project mention: Is there a program available for bulk image reverse searching? | reddit.com/r/AskProgramming | 2023-02-02

    In the past I used stuff like beautifulsoup for webscraping but I’ve heard good things about https://scrapy.org/

  • pyspider

    A Powerful Spider(Web Crawler) System in Python.

  • Sonar

    Write Clean Python Code. Always.. Sonar helps you commit clean code every time. With over 225 unique rules to find Python bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work.

  • requests-html

    Pythonic HTML Parsing for Humans™

    Project mention: 8 Most Popular Python HTML Web Scraping Packages with Benchmarks | dev.to | 2023-02-01

    requests-html

  • portia

    Visual scraping for Scrapy

  • MechanicalSoup

    A Python library for automating interaction with websites.

    Project mention: Alternatives to Selenium? | reddit.com/r/pythontips | 2022-07-21

    Try with Mechanicalsoup https://mechanicalsoup.readthedocs.io/en/stable/

  • RoboBrowser

  • Grab

    Web Scraping Framework

  • InfluxDB

    Build time-series-based applications quickly and at scale.. InfluxDB is the Time Series Platform where developers build real-time applications for analytics, IoT and cloud-native services. Easy to start, it is available in the cloud or on-premises.

  • gain

    Web crawling framework based on asyncio.

  • PSpider

    简单易用的Python爬虫框架,QQ交流群:597510560

  • feedparser

    Parse feeds in Python

    Project mention: Newb learning GitHub & Python. Projects? | reddit.com/r/github | 2023-01-22

    feedparser

  • cola

    A high-level distributed crawling framework.

  • Sukhoi

    Minimalist and powerful Web Crawler.

  • MSpider

    Spider

  • spidy Web Crawler

    The simple, easy to use command line web crawler.

  • google-search-results-python

    Google Search Results via SERP API pip Python Package

    Project mention: Using Google Jobs Listing Results API from SerpApi | dev.to | 2022-11-15

    google-search-results is a SerpApi API package.

  • Crawley

    Pythonic Crawling / Scraping Framework based on Non Blocking I/O operations.

  • brownant

    Brownant is a web data extracting framework.

  • Demiurge

    PyQuery-based scraping micro-framework.

  • Pomp

    Screen scraping and web crawling framework

  • FastImage

    Python library that finds the size / type of an image given its URI by fetching as little as needed (by bmuller)

  • microwler

    A micro-framework for asynchronous deep crawls and web scraping with Python

  • Mariner

    This a is mirror of Gitlab repository. Open your issues and pull requests there. (by radek-sprta)

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2023-02-02.

Python Web Crawling related posts

Index

What are some of the best open-source Web Crawling projects in Python? This list will help you:

Project Stars
1 Scrapy 46,031
2 pyspider 15,726
3 requests-html 12,923
4 portia 8,738
5 MechanicalSoup 4,306
6 RoboBrowser 3,647
7 Grab 2,259
8 gain 2,015
9 PSpider 1,735
10 feedparser 1,542
11 cola 1,459
12 Sukhoi 873
13 MSpider 344
14 spidy Web Crawler 304
15 google-search-results-python 263
16 Crawley 174
17 brownant 157
18 Demiurge 109
19 Pomp 61
20 FastImage 28
21 microwler 11
22 Mariner 2
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com