Python web-crawler

Open-source Python projects categorized as web-crawler

Top 10 Python web-crawler Projects

web-crawler
  1. Scrapegraph-ai

    Python scraper based on AI

    Project mention: ScrapeGraphAI: You Only Scrape Once | news.ycombinator.com | 2025-05-20
  2. InfluxDB

    InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.

    InfluxDB logo
  3. omniparse

    Ingest, parse, and optimize any data format ➡️ from documents to multimedia ➡️ for enhanced compatibility with GenAI frameworks

    Project mention: Show HN: I Made an Open Source Platform for Structuring Any Unstructured Data | news.ycombinator.com | 2024-07-02
  4. crawlee-python

    Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.

    Project mention: How to scrape TikTok using Python | dev.to | 2025-04-30

    Which hashtags are trending now? What is an influencer's engagement rate? What topics are important for a content creator? You can find answers to these and many other questions by analyzing TikTok data. However, for analysis, you need to extract the data in a convenient format. In this blog, we'll explore how to scrape TikTok using Crawlee for Python.

  5. PSpider

    简单易用的Python爬虫框架,QQ交流群:597510560

  6. kochat

    Opensource Korean chatbot framework

  7. spidy Web Crawler

    The simple, easy to use command line web crawler.

  8. Ignareo-ISML-auto-voter

    Ignareo the Carillon, a web crawler/spider template of ultimate high concurrency built for leprechauns. Carillons as the best web spiders; Long live the golden years of leprechauns! (ISML=international saimoe; 2022 ISML is last ISML)

  9. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  10. GoodreadsScraper

    Scrape data from Goodreads using Scrapy and Selenium :books:

  11. Python

    This repository contains the python source code, containing more than 40 python projects, involving many fields.仓库用于储存python源代码, 包含40多个python项目,涉及爬虫、算法、OpenGL、tkinter、面向对象编程等多个领域。 (by qfcy)

  12. CobWeb-lnx

    CobWeb is a Python library for web scraping. The library consists of two classes: Spider and Scraper.

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python web-crawler discussion

Log in or Post with

Python web-crawler related posts

  • How to scrape Crunchbase using Python in 2024 (Easy Guide)

    5 projects | dev.to | 15 Jan 2025
  • Multiparadigmatic Web Scraping Tool!

    1 project | /r/computerscience | 14 May 2023

Index

What are some of the best open-source web-crawler projects in Python? This list will help you:

# Project Stars
1 Scrapegraph-ai 20,030
2 omniparse 6,589
3 crawlee-python 5,749
4 PSpider 1,839
5 kochat 455
6 spidy Web Crawler 347
7 Ignareo-ISML-auto-voter 187
8 GoodreadsScraper 138
9 Python 57
10 CobWeb-lnx 38

Sponsored
InfluxDB – Built for High-Performance Time Series Workloads
InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
www.influxdata.com

Did you know that Python is
the 2nd most popular programming language
based on number of references?