Python Scraping

Open-source Python projects categorized as Scraping

Top 23 Python Scraping Projects

  • Scrapy

    Scrapy, a fast high-level web crawling & scraping framework for Python.

  • Project mention: Scrapy: A Fast and Powerful Scraping and Web Crawling Framework | news.ycombinator.com | 2024-02-16
  • requests-html

    Pythonic HTML Parsing for Humans™

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • undetected-chromedriver

    Custom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation systems (like Distil / Imperva/ Datadadome / CloudFlare IUAM)

  • Project mention: ad_clicker premium - Google/Bing Ads Clicker | /r/IMadeThis | 2023-12-08

    This command-line tool clicks ads for a certain query on Google/Bing search using undetected_chromedriver package. Supports proxy, running multiple simultaneous browsers, ad targeting/exclusion, and running in loop.

  • autoscraper

    A Smart, Automatic, Fast and Lightweight Web Scraper for Python

  • fake-useragent

    Up-to-date simple useragent faker with real world database

  • Automatic-Udemy-Course-Enroller-GET-PAID-UDEMY-COURSES-for-FREE

    Do you want to LEARN NEW STUFF for FREE? Don't worry, with the power of web-scraping and automation, this script will find the necessary Udemy coupons & enroll you for PAID UDEMY COURSES, ABSOLUTELY FREE!

  • trafilatura

    Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments

  • Project mention: Trafilatura: Python tool to gather text on the Web | news.ycombinator.com | 2023-08-14

    The feature list answers that question pretty well: https://github.com/adbar/trafilatura#features

    Basically: you could implement all of this on top of BeautifulSoup - polite crawling policies, sitemap and feed parsing, URL de-duplication, parallel processing, download queues, heuristics for extracting just the main article content, metadata extraction, language detection... but it would require writing an enormous amount of extra code.

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • snoop

    Snoop — инструмент разведки на основе открытых данных (OSINT world)

  • Project mention: Osint update of the Snoop Project tool search for user by nickname | news.ycombinator.com | 2024-01-02
  • Scrapegraph-ai

    Python scraper based on AI

  • Project mention: This Week In Python | dev.to | 2024-05-10

    Scrapegraph-ai – Python scraper based on AI

  • Grab

    Web Scraping Framework

  • facebook-scraper

    Scrape Facebook public pages without an API key

  • Project mention: scraping instagram without selenium | /r/webscraping | 2023-06-30

    Afaik on Facebook there are no such APIs, only good old HTML parsing, check out this project for example https://github.com/kevinzg/facebook-scraper (most of the parsing code is here https://github.com/kevinzg/facebook-scraper/blob/master/facebook_scraper/extractors.py )

  • shot-scraper

    A command-line utility for taking automated screenshots of websites

  • Project mention: I want to create IMDB for Open source projects | news.ycombinator.com | 2024-04-15

    I had one of these recently! https://github.com/simonw/shot-scraper/pull/133/files

    They're /incredibly/ rare though.

  • cloudproxy

    Hide your scrapers IP behind the cloud. Provision proxy servers across different cloud providers to improve your scraping success.

  • mlscraper

    🤖 Scrape data from HTML websites automatically by just providing examples

  • parsel

    Parsel lets you extract data from XML/HTML documents using XPath or CSS selectors

  • DataEngineeringProject

    Example end to end data engineering project.

  • Scweet

    A simple and unlimited twitter scraper : scrape tweets, likes, retweets, following, followers, user info, images...

  • Project mention: Twitter api reaching rate limit. 5calls per 15 mins just to get user likes. | /r/learnprogramming | 2023-05-22

    hmm,, do you know any good one? I found this one but it doesn't scrape a single tweet's likes and followers https://github.com/Altimis/Scweet

  • botasaurus

    The All in One Framework to build Awesome Scrapers.

  • Project mention: This Week In Python | dev.to | 2024-04-05

    botasaurus – The All in One Framework to build Awesome Scrapers

  • loconotion

    📄 Python tool to turn Notion.so pages into lightweight, customizable static websites

  • Edu-Mail-Generator

    Generate Free Edu Mail(s) within minutes

  • gazpacho

    🥫 The simple, fast, and modern web scraping library

  • google-maps-scraper

    👋 HOLA 👋 HOLA 👋 HOLA ! ENJOY OUR GOOGLE MAPS SCRAPER 🚀 TO EFFORTLESSLY EXTRACT DATA SUCH AS NAMES, ADDRESSES, PHONE NUMBERS, REVIEWS, WEBSITES, AND RATINGS FROM GOOGLE MAPS WITH EASE! 🤖

  • Project mention: I create a google maps scraper, let me know your thoughts | /r/webscraping | 2023-07-06

    My scrapers runs at 120 Listing per 10 Minutes. So yours is quite Fast. You can see my scraper at https://github.com/omkarcloud/google-maps-scraper. It is quite popular with 95 Stars.

  • lookyloo

    Lookyloo is a web interface that allows users to capture a website page and then display a tree of domains that call each other.

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Scraping related posts

  • Python Scraper Based on AI

    1 project | news.ycombinator.com | 9 May 2024
  • 2024-03-01 listening in on the neighborhood

    5 projects | news.ycombinator.com | 2 Mar 2024
  • Scrapy: A Fast and Powerful Scraping and Web Crawling Framework

    1 project | news.ycombinator.com | 16 Feb 2024
  • A command-line utility for taking automated screenshots of websites

    1 project | news.ycombinator.com | 15 Dec 2023
  • Direction Of The Stock Market

    1 project | /r/StockMarket | 6 Dec 2023
  • Web Scraping via JavaScript Runtime Heap Snapshots (2022)

    1 project | news.ycombinator.com | 8 Aug 2023
  • I have a panic selling problem

    1 project | /r/stocks | 7 Jul 2023
  • A note from our sponsor - InfluxDB
    www.influxdata.com | 10 May 2024
    Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →

Index

What are some of the best open-source Scraping projects in Python? This list will help you:

Project Stars
1 Scrapy 51,023
2 requests-html 13,595
3 undetected-chromedriver 8,485
4 autoscraper 5,952
5 fake-useragent 3,481
6 Automatic-Udemy-Course-Enroller-GET-PAID-UDEMY-COURSES-for-FREE 3,060
7 trafilatura 2,898
8 snoop 2,701
9 Scrapegraph-ai 2,388
10 Grab 2,357
11 facebook-scraper 2,192
12 shot-scraper 1,537
13 cloudproxy 1,358
14 mlscraper 1,231
15 parsel 1,085
16 DataEngineeringProject 985
17 Scweet 969
18 botasaurus 934
19 loconotion 816
20 Edu-Mail-Generator 788
21 gazpacho 731
22 google-maps-scraper 743
23 lookyloo 655

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com