Python Scraping

Open-source Python projects categorized as Scraping

Top 23 Python Scraping Projects

  • Scrapy

    Scrapy, a fast high-level web crawling & scraping framework for Python.

  • Project mention: Scrapy: A Fast and Powerful Scraping and Web Crawling Framework | news.ycombinator.com | 2024-02-16
  • requests-html

    Pythonic HTML Parsing for Humans™

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • undetected-chromedriver

    Custom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation systems (like Distil / Imperva/ Datadadome / CloudFlare IUAM)

  • Project mention: ad_clicker premium - Google/Bing Ads Clicker | /r/IMadeThis | 2023-12-08

    This command-line tool clicks ads for a certain query on Google/Bing search using undetected_chromedriver package. Supports proxy, running multiple simultaneous browsers, ad targeting/exclusion, and running in loop.

  • autoscraper

    A Smart, Automatic, Fast and Lightweight Web Scraper for Python

  • fake-useragent

    Up-to-date simple useragent faker with real world database

  • Automatic-Udemy-Course-Enroller-GET-PAID-UDEMY-COURSES-for-FREE

    Do you want to LEARN NEW STUFF for FREE? Don't worry, with the power of web-scraping and automation, this script will find the necessary Udemy coupons & enroll you for PAID UDEMY COURSES, ABSOLUTELY FREE!

  • trafilatura

    Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments

  • Project mention: Trafilatura: Python tool to gather text on the Web | news.ycombinator.com | 2023-08-14

    The feature list answers that question pretty well: https://github.com/adbar/trafilatura#features

    Basically: you could implement all of this on top of BeautifulSoup - polite crawling policies, sitemap and feed parsing, URL de-duplication, parallel processing, download queues, heuristics for extracting just the main article content, metadata extraction, language detection... but it would require writing an enormous amount of extra code.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • snoop

    Snoop — инструмент разведки на основе открытых данных (OSINT world)

  • Project mention: Osint update of the Snoop Project tool search for user by nickname | news.ycombinator.com | 2024-01-02
  • Grab

    Web Scraping Framework

  • facebook-scraper

    Scrape Facebook public pages without an API key

  • Project mention: scraping instagram without selenium | /r/webscraping | 2023-06-30

    Afaik on Facebook there are no such APIs, only good old HTML parsing, check out this project for example https://github.com/kevinzg/facebook-scraper (most of the parsing code is here https://github.com/kevinzg/facebook-scraper/blob/master/facebook_scraper/extractors.py )

  • shot-scraper

    A command-line utility for taking automated screenshots of websites

  • Project mention: I want to create IMDB for Open source projects | news.ycombinator.com | 2024-04-15

    I had one of these recently! https://github.com/simonw/shot-scraper/pull/133/files

    They're /incredibly/ rare though.

  • cloudproxy

    Hide your scrapers IP behind the cloud. Provision proxy servers across different cloud providers to improve your scraping success.

  • mlscraper

    🤖 Scrape data from HTML websites automatically by just providing examples

  • parsel

    Parsel lets you extract data from XML/HTML documents using XPath or CSS selectors

  • DataEngineeringProject

    Example end to end data engineering project.

  • Scweet

    A simple and unlimited twitter scraper : scrape tweets, likes, retweets, following, followers, user info, images...

  • Project mention: Twitter api reaching rate limit. 5calls per 15 mins just to get user likes. | /r/learnprogramming | 2023-05-22

    hmm,, do you know any good one? I found this one but it doesn't scrape a single tweet's likes and followers https://github.com/Altimis/Scweet

  • botasaurus

    The All in One Framework to build Awesome Scrapers.

  • Project mention: This Week In Python | dev.to | 2024-04-05

    botasaurus – The All in One Framework to build Awesome Scrapers

  • loconotion

    📄 Python tool to turn Notion.so pages into lightweight, customizable static websites

  • Edu-Mail-Generator

    Generate Free Edu Mail(s) within minutes

  • gazpacho

    🥫 The simple, fast, and modern web scraping library

  • google-maps-scraper

    👋 HOLA 👋 HOLA 👋 HOLA ! ENJOY OUR GOOGLE MAPS SCRAPER 🚀 TO EFFORTLESSLY EXTRACT DATA SUCH AS NAMES, ADDRESSES, PHONE NUMBERS, REVIEWS, WEBSITES, AND RATINGS FROM GOOGLE MAPS WITH EASE! 🤖

  • Project mention: I create a google maps scraper, let me know your thoughts | /r/webscraping | 2023-07-06

    My scrapers runs at 120 Listing per 10 Minutes. So yours is quite Fast. You can see my scraper at https://github.com/omkarcloud/google-maps-scraper. It is quite popular with 95 Stars.

  • lookyloo

    Lookyloo is a web interface that allows users to capture a website page and then display a tree of domains that call each other.

  • social-media-profiles-regexs

    :card_index: Extract social media profiles and more with regular expressions

  • Project mention: How would I regex TikTok profile links? | /r/AutoModerator | 2023-05-09

    Admittedly I did copy most of the regexes from https://github.com/lorey/social-media-profiles-regexs and updated some of them where needed.

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Scraping related posts

Index

What are some of the best open-source Scraping projects in Python? This list will help you:

Project Stars
1 Scrapy 50,824
2 requests-html 13,574
3 undetected-chromedriver 8,018
4 autoscraper 5,937
5 fake-useragent 3,459
6 Automatic-Udemy-Course-Enroller-GET-PAID-UDEMY-COURSES-for-FREE 3,048
7 trafilatura 2,740
8 snoop 2,670
9 Grab 2,353
10 facebook-scraper 2,177
11 shot-scraper 1,517
12 cloudproxy 1,349
13 mlscraper 1,219
14 parsel 1,074
15 DataEngineeringProject 985
16 Scweet 966
17 botasaurus 870
18 loconotion 813
19 Edu-Mail-Generator 788
20 gazpacho 730
21 google-maps-scraper 703
22 lookyloo 653
23 social-media-profiles-regexs 589

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com