Python Web Crawling

Open-source Python projects categorized as Web Crawling

Top 22 Python Web Crawling Projects

  • Scrapy

    Scrapy, a fast high-level web crawling & scraping framework for Python.

    Project mention: Scrapy: A Fast and Powerful Scraping and Web Crawling Framework | news.ycombinator.com | 2024-02-16
  • pyspider

    A Powerful Spider(Web Crawler) System in Python.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

  • requests-html

    Pythonic HTML Parsing for Humans™

  • portia

    Visual scraping for Scrapy

  • MechanicalSoup

    A Python library for automating interaction with websites.

    Project mention: How to scrape a website with Python (Beginner tutorial) | dev.to | 2024-02-22

    MechanicalSoup is a Python library for web scraping that combines the simplicity of Requests with the convenience of BeautifulSoup. It's particularly useful for interacting with web forms, like login pages. Here's a basic example to illustrate how you can use MechanicalSoup for web scraping:

  • RoboBrowser

  • Grab

    Web Scraping Framework

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

  • gain

    Web crawling framework based on asyncio.

  • feedparser

    Parse feeds in Python

    Project mention: RSS can be used to distribute all sorts of information | news.ycombinator.com | 2023-11-20

    There is JSON Feed¹ already. One of the spec writers is behind micro.blog, which is the first place I saw it(and also one of the few places I've seen it). I don't think it is a bad idea, and it doesn't take all that long to implement it.

    I have long hoped it would pick up with the JSON-ify everything crowd, just so I'd never see a non-Atom feed again. We perhaps wouldn't need sooo much of the magic that is wrapped up in packages like feedparser² to deal with all the brokeness of RSS in the wild then.

    ¹ https://www.jsonfeed.org/

    ² https://github.com/kurtmckee/feedparser

  • PSpider

    简单易用的Python爬虫框架,QQ交流群:597510560

  • cola

    A high-level distributed crawling framework.

  • Sukhoi

    Minimalist and powerful Web Crawler.

  • google-search-results-python

    Google Search Results via SERP API pip Python Package

    Project mention: Make Direct Async Requests to SerpApi with Python | dev.to | 2023-05-24

    In this blog post we'll cover on how to make direct requests to serpapi.com/search.json without using SerpApi's google-search-results Python client.

  • MSpider

    Spider

  • spidy Web Crawler

    The simple, easy to use command line web crawler.

  • Crawley

    Pythonic Crawling / Scraping Framework based on Non Blocking I/O operations.

  • brownant

    Brownant is a web data extracting framework.

  • Demiurge

    PyQuery-based scraping micro-framework.

  • Pomp

    Screen scraping and web crawling framework

  • FastImage

    Python library that finds the size / type of an image given its URI by fetching as little as needed (by bmuller)

  • microwler

    A micro-framework for asynchronous deep crawls and web scraping with Python

  • Mariner

    This a is mirror of Gitlab repository. Open your issues and pull requests there. (by radek-sprta)

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2024-02-22.

Python Web Crawling related posts

Index

What are some of the best open-source Web Crawling projects in Python? This list will help you:

Project Stars
1 Scrapy 50,763
2 pyspider 16,310
3 requests-html 13,574
4 portia 9,159
5 MechanicalSoup 4,545
6 RoboBrowser 3,689
7 Grab 2,353
8 gain 2,031
9 feedparser 1,823
10 PSpider 1,811
11 cola 1,485
12 Sukhoi 878
13 google-search-results-python 514
14 MSpider 345
15 spidy Web Crawler 322
16 Crawley 182
17 brownant 157
18 Demiurge 110
19 Pomp 60
20 FastImage 28
21 microwler 13
22 Mariner 2
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com