Python Web Crawling

Open-source Python projects categorized as Web Crawling

Top 22 Python Web Crawling Projects

Web Crawling
  • Scrapy

    Scrapy, a fast high-level web crawling & scraping framework for Python.

  • Project mention: Scrapy Vs. Crawlee | dev.to | 2024-05-15

    Scrapy is an open-source Python-based web scraping framework that extracts data from websites. With Scrapy, you create spiders, which are autonomous scripts to download and process web content. The limitation of Scrapy is that it does not work very well with JavaScript rendered websites, as it was designed for static HTML pages. We will do a comparison later in the article about this.

  • Scout Monitoring

    Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.

    Scout Monitoring logo
  • requests-html

    Pythonic HTML Parsing for Humans™

  • portia

    Visual scraping for Scrapy

  • MechanicalSoup

    A Python library for automating interaction with websites.

  • Project mention: How to scrape a website with Python (Beginner tutorial) | dev.to | 2024-02-22

    MechanicalSoup is a Python library for web scraping that combines the simplicity of Requests with the convenience of BeautifulSoup. It's particularly useful for interacting with web forms, like login pages. Here's a basic example to illustrate how you can use MechanicalSoup for web scraping:

  • RoboBrowser

  • Grab

    Web Scraping Framework

  • gain

    Web crawling framework based on asyncio.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • feedparser

    Parse feeds in Python

  • Project mention: RSS can be used to distribute all sorts of information | news.ycombinator.com | 2023-11-20

    There is JSON Feed¹ already. One of the spec writers is behind micro.blog, which is the first place I saw it(and also one of the few places I've seen it). I don't think it is a bad idea, and it doesn't take all that long to implement it.

    I have long hoped it would pick up with the JSON-ify everything crowd, just so I'd never see a non-Atom feed again. We perhaps wouldn't need sooo much of the magic that is wrapped up in packages like feedparser² to deal with all the brokeness of RSS in the wild then.

    ¹ https://www.jsonfeed.org/

    ² https://github.com/kurtmckee/feedparser

  • PSpider

    简单易用的Python爬虫框架,QQ交流群:597510560

  • cola

    A high-level distributed crawling framework.

  • botasaurus

    The All in One Framework to build Awesome Scrapers.

  • Project mention: This Week In Python | dev.to | 2024-04-05

    botasaurus – The All in One Framework to build Awesome Scrapers

  • Sukhoi

    Minimalist and powerful Web Crawler.

  • google-search-results-python

    Google Search Results via SERP API pip Python Package

  • MSpider

    Spider

  • spidy Web Crawler

    The simple, easy to use command line web crawler.

  • Crawley

    Pythonic Crawling / Scraping Framework based on Non Blocking I/O operations.

  • brownant

    Brownant is a web data extracting framework.

  • Demiurge

    PyQuery-based scraping micro-framework.

  • Pomp

    Screen scraping and web crawling framework

  • FastImage

    Python library that finds the size / type of an image given its URI by fetching as little as needed (by bmuller)

  • microwler

    A micro-framework for asynchronous deep crawls and web scraping with Python

  • Mariner

    This a is mirror of Gitlab repository. Open your issues and pull requests there. (by radek-sprta)

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Web Crawling discussion

Log in or Post with

Python Web Crawling related posts

  • Claude is now available in Europe

    2 projects | news.ycombinator.com | 14 May 2024
  • How to scrape a website with Python (Beginner tutorial)

    1 project | dev.to | 22 Feb 2024
  • Scrapy: A Fast and Powerful Scraping and Web Crawling Framework

    1 project | news.ycombinator.com | 16 Feb 2024
  • Seven Python Projects to Elevate Your Coding Skills

    3 projects | dev.to | 15 Feb 2024
  • What is SERP? Meaning, Use Cases and Approaches

    3 projects | dev.to | 11 Dec 2023
  • Help! trying to use scraping for my dissertation but I am clueless

    1 project | /r/webscraping | 6 Jul 2023
  • Turning webpages into pdf

    2 projects | /r/learnpython | 6 Jul 2023
  • A note from our sponsor - Scout Monitoring
    www.scoutapm.com | 16 Jun 2024
    Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today. Learn more →

Index

What are some of the best open-source Web Crawling projects in Python? This list will help you:

Project Stars
1 Scrapy 51,399
2 requests-html 13,619
3 portia 9,214
4 MechanicalSoup 4,582
5 RoboBrowser 3,699
6 Grab 2,366
7 gain 2,029
8 feedparser 1,870
9 PSpider 1,822
10 cola 1,488
11 botasaurus 1,062
12 Sukhoi 880
13 google-search-results-python 537
14 MSpider 345
15 spidy Web Crawler 329
16 Crawley 182
17 brownant 158
18 Demiurge 110
19 Pomp 60
20 FastImage 28
21 microwler 13
22 Mariner 2

Sponsored
Free Django app performance insights with Scout Monitoring
Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
www.scoutapm.com