Python Webscraping

Open-source Python projects categorized as Webscraping | Edit details

Top 23 Python Webscraping Projects

  • autoscraper

    A Smart, Automatic, Fast and Lightweight Web Scraper for Python

    Project mention: Scrapping - How to deal with page changes Ai | reddit.com/r/webscraping | 2022-03-25

    It depends on the website, but autoscraper was used to calculate similar nodes given the text to search. Not sure how it works now but it's open source.

  • gazpacho

    🥫 The simple, fast, and modern web scraping library

  • SonarQube

    Static code analysis for 29 languages.. Your projects are multi-language. So is SonarQube analysis. Find Bugs, Vulnerabilities, Security Hotspots, and Code Smells so you can release quality code every time. Get started analyzing your projects today for free.

  • instascrape

    Powerful and flexible Instagram scraping library for Python, providing easy-to-use and expressive tools for accessing data programmatically

    Project mention: Question about Instagram scraping problem for Thesis (Too big size of data to scrape) | reddit.com/r/learnpython | 2021-11-09

    Link to the open source package I used: https://github.com/chris-greening/instascrape

  • TikTokBot

    A TikTokBot that downloads trending tiktok videos and compiles them using FFmpeg

  • ebayScraper

    Scrape all eBay sold listings to determine average/median pricing, plot listings over time with trend lines, and extract to excel

    Project mention: I wrote a python program for scraping Ebay to find a cheap used espresso machines under $200. | reddit.com/r/Python | 2021-12-11

    If you ever want to expand on this project more, you might enjoy looking at my implementation of an eBay Scraper I made last year: https://github.com/driscoll42/ebayMarketAnalyzer You can see the code I used to specify a specific search to scrape eBay for those instead of needing to put the specific search URL, also filters based on price. The main issue you'll run into sooner or later are CAPTCHAs eBay added earlier this year.

  • CoWin-Vaccine-Notifier

    Automated Python Script to retrieve vaccine slots availability and get notified when a slot is available.

  • zimit

    Make a ZIM file from any Web site and surf offline!

    Project mention: Reading from the web offline and distraction-free | news.ycombinator.com | 2021-10-10

    which worked quite well for most sites, but still very far from a general-purpose solution.

    There is also more powerful/general-purpose scraper that generates a ZIM file here: https://github.com/openzim/zimit

    It would be really nice to a "common" scraper code base that takes care of scraping (possibly with a real headless browser) and outputs all assets as files + info as JSON. This common code base could then be used by all kinds of programs to package the content as standalone HTML zip files, ePub, ZIM, or even PDF for crazy people like me who like to print things ;)

  • Scout APM

    Less time debugging, more time building. Scout APM allows you to find and fix performance issues with no hassle. Now with error monitoring and external services monitoring, Scout is a developer's best friend when it comes to application development.

  • web_check

    Script for checking changes in webpages

    Project mention: Help me automate a boring task. [Print TO HTML] | reddit.com/r/learnpython | 2021-10-10

    Sure, in this project https://github.com/Jaime-alv/web_check. Look at checker.py inside web_check folder, line 37 onwards.

  • htmldate

    Fast and robust date extraction from web pages, with Python or on the command-line

    Project mention: How does Firefox's Reader View work? | news.ycombinator.com | 2022-03-30
  • newsemble

    API for fetching data from news websites.

    Project mention: Newsemble: An API to fetch current news data | reddit.com/r/Python | 2021-07-18

    I read through the documentation and tinkered around with it -- great work! One recommendation I would make, particularly if you're hoping that this will be useful long-term for NLP, is not to delete the previously scraped data. For instance, http://www.newsemble.ml/news only contains 129 results, which is nowhere near comprehensive enough to ensure any kind of statistically significant NLP.

  • redditsfinder

    Archive a reddit user's post history. Formatted overview of a profile, JSON containing every post, and picture downloads. Uses the pushshift API.

    Project mention: Is it possible to fetch entire history of comments by a user? | reddit.com/r/redditdev | 2022-03-05
  • iSubRip

    A Python package for scraping and downloading subtitles from iTunes movie pages.

    Project mention: iSubRip: A Python package for scraping and downloading subtitles from iTunes movie pages | reddit.com/r/trackers | 2022-03-27
  • Youtube-Scraping-Slenium

    Automatically creates a Youtube channel dashboard

  • SearchifyX

    Stealthy answer searcher

    Project mention: is there anything else i can do to complete my edgenuity faster? | reddit.com/r/edgenuity | 2022-04-12
  • scrapingant-client-python

    ScrapingAnt API client for Python.

  • botcity-framework-web-python

    BotCity Framework Web - Python

    Project mention: BotCity Framework Web 0.4.1 Released | news.ycombinator.com | 2022-03-11
  • pycraigslist

    A fast and expressive Craigslist API wrapper

    Project mention: Web Scraping Used Car data via Carfax.com | reddit.com/r/webscraping | 2021-08-30

    I understand your concerns, as the quality of posts can vary greatly on Craigslist. But the plus side is there is a vast amount or used cars on Craigslist. If you know Python, try pycraigslist.

  • raspberry-pi-stock-checker

    A configurable python webscraper that checks raspberry pi stocks from verified sellers

    Project mention: I made a Raspberry Pi Stock Checker | reddit.com/r/Python | 2022-02-05

    I been trying to get a raspberry pi 4 for the last 6 months and they are always out of stock due to the global chip shortage. So I made a python webscraper that does the job for me. Simple request + beautiful soup. Check it out on github: https://github.com/louie-cai/raspberry-pi-stock-checker. Contributions are very welcome.

  • HackerNEWS-Simplified

    A more simplified, straightforward, and plain version of Hacker News.

    Project mention: HackerNews Simplified with Python3 and BeautifulSoup | reddit.com/r/learnprogramming | 2021-10-16

    HackerNews Simplified Is A more simplified and straightforward version of HackerNews. The project uses BeautifulSoup, a python web scraping framework to scrape the data from the hackernews website and take ony the most relevant news and displays it on your terminal.

  • Code

    Place for all my code. Take a look! :D (by vipermark7)

    Project mention: What's the go to place to learn Clojure? | reddit.com/r/Clojure | 2022-05-12

    The resources here are great! But in the end, it's best to solve small problems that interest you. I recently wrote a [simplified crossword puzzle solver](https://github.com/vipermark7/Code/blob/master/lispstuff/crossword.clj) and it forced me to wriite a small program of okay quality rather than endlessly agonize about the absolute perfect way to do FP in Clojure :)

  • beautifulday

    Learning project for scraping weather from weather.gc.ca. Print out simple or extended weather reports for any Canadian city to a console.

  • Amazon-Product-Information-Scraper

    This is a python web-scraping project to get all the product names, price, review stars and review count of a particular category of the product

    Project mention: Amazon Product Information Scraper | reddit.com/r/Python | 2022-05-01

    Github Link: https://github.com/praneethravuri/Amazon-Product-Information-Scraper

  • Investopedia-Bot

    Pick the best stocks and automate Investopedia

    Project mention: Automate Investopedia stock simulator with Investopedia-bot | reddit.com/r/programming | 2022-04-07
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2022-05-12.

Python Webscraping related posts

Index

What are some of the best open-source Webscraping projects in Python? This list will help you:

Project Stars
1 autoscraper 4,367
2 gazpacho 641
3 instascrape 444
4 TikTokBot 219
5 ebayScraper 108
6 CoWin-Vaccine-Notifier 100
7 zimit 90
8 web_check 52
9 htmldate 47
10 newsemble 42
11 redditsfinder 32
12 iSubRip 24
13 Youtube-Scraping-Slenium 20
14 SearchifyX 18
15 scrapingant-client-python 17
16 botcity-framework-web-python 16
17 pycraigslist 14
18 raspberry-pi-stock-checker 11
19 HackerNEWS-Simplified 5
20 Code 4
21 beautifulday 3
22 Amazon-Product-Information-Scraper 3
23 Investopedia-Bot 3
Find remote jobs at our new job board 99remotejobs.com. There are 7 new remote jobs listed recently.
Are you hiring? Post a new remote job listing for free.
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com