Python web-scraper

Open-source Python projects categorized as web-scraper

Top 20 Python web-scraper Projects

web-scraper
  1. lightnovel-crawler

    Generate and download e-books from online sources.

  2. Judoscale

    Save 47% on cloud hosting with autoscaling that just works. Judoscale integrates with Django, FastAPI, Celery, and RQ to make autoscaling easy and reliable. Save big, and say goodbye to request timeouts and backed-up task queues.

    Judoscale logo
  3. Monkey-DL (Anime Downloader)

    Bulk download your favourite anime episodes from your favourite anime websites

  4. web-scraping

    Detailed web scraping tutorials for dummies with financial data crawlers on Reddit WallStreetBets, CME (both options and futures), US Treasury, CFTC, LME, MacroTrends, SHFE and alternative data crawlers on Tomtom, BBC, Wall Street Journal, Al Jazeera, Reuters, Financial Times, Bloomberg, CNN, Fortune, The Economist

  5. summarizer

    A Reddit bot that summarizes news articles written in Spanish or English. It uses a custom built algorithm to rank words and sentences.

  6. facebook_page_scraper

    Scrapes facebook's pages front end with no limitations & provides a feature to turn data into structured JSON or CSV

  7. Senpwai

    A desktop app for tracking and batch downloading anime

  8. tls-requests

    TLS Requests is a powerful Python library for secure HTTP requests, offering browser-like TLS client, fingerprinting, anti-bot page bypass, and high performance.

    Project mention: Must Try Open-Source Python TLS Requests: Simplify Web Scraping, Bypass Cloudflare 403 Forbidden (WAF) | dev.to | 2024-12-13

    Modern websites increasingly use TLS Fingerprinting and anti-bot tools like Cloudflare Bot Fight Mode to block web crawlers. TLS Requests bypass these obstacles by mimicking browser-like TLS behaviors, making it easy to scrape data or interact with websites that use sophisticated anti-bot measures.

  9. InfluxDB

    InfluxDB high-performance time series database. Collect, organize, and act on massive volumes of high-resolution data to power real-time intelligent systems.

    InfluxDB logo
  10. CobWeb-lnx

    CobWeb is a Python library for web scraping. The library consists of two classes: Spider and Scraper.

  11. opensubtitles-scraper

    scrape subtitles from opensubtitles.org

  12. tagalog-dictionary-scraper

    Builds a Tagalog dictionary by collecting Tagalog words from tagalog.pinoydictionary.com

  13. reddit-bots

    A collection of Reddit bots that I use to enhance the subreddits I manage.

  14. mexican-jobs-2020

    Data ETL & Analysis on thousands of job listings from the official Mexican job board (2020 edition).

  15. git-pull

    Parallelized web scraper for Github

  16. tweet-transcriber

    A Reddit bot that transcribes tweets from comments and submissions links, mirrors their images and replies back with a formatted Markdown message.

  17. Abosar

    অবসর 📚 A collection of short Bengali stories web scraped from various Bengali eMagazines and eNewspapers.

  18. Python-Web-Scraper

    An adaptive Python Web Scraper App to catch the best deals by scraping and parsing data from select E-Commerce sites.

  19. varieteebot

    A telegram bot that sends today's tee of some tee shops.

  20. gli99

    Web scraper for gifcities.org

  21. iw-scraper

    Web scraper for imovelweb listings

  22. nanoscrape

    Simple scraping program that can download webpages, Discord embeds, and more.

  23. CodeRabbit

    CodeRabbit: AI Code Reviews for Developers. Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.

    CodeRabbit logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python web-scraper discussion

Log in or Post with

Python web-scraper related posts

  • Help with Paperback IOS.

    1 project | /r/mangapiracy | 18 Jun 2023
  • Multiparadigmatic Web Scraping Tool!

    1 project | /r/computerscience | 14 May 2023
  • ISSTH left me disappointed

    1 project | /r/noveltranslations | 1 May 2023
  • a discord server and bot to fetch epub chapters from novels?

    1 project | /r/MartialMemes | 16 Apr 2023
  • Python Web Scraper/Crawler for E-Commerce sites. Currently supports only a few websites but im looking to expand that list. Tips/criticism are welcomed. This is the first project for my student CV (0 working experience) so I'd like it to be as polished as possible.

    1 project | /r/programming | 1 Mar 2023
  • Wat is jullie ervaring met e-readers?

    2 projects | /r/thenetherlands | 10 Jul 2022
  • Does the kindle have a search function? A working one? I’ve seen videos but those are like years old.

    1 project | /r/kindle | 21 May 2022
  • A note from our sponsor - Judoscale
    judoscale.com | 19 Apr 2025
    Judoscale integrates with Django, FastAPI, Celery, and RQ to make autoscaling easy and reliable. Save big, and say goodbye to request timeouts and backed-up task queues. Learn more →

Index

What are some of the best open-source web-scraper projects in Python? This list will help you:

# Project Stars
1 lightnovel-crawler 1,662
2 Monkey-DL (Anime Downloader) 827
3 web-scraping 762
4 summarizer 273
5 facebook_page_scraper 254
6 Senpwai 241
7 tls-requests 44
8 CobWeb-lnx 38
9 opensubtitles-scraper 33
10 tagalog-dictionary-scraper 28
11 reddit-bots 25
12 mexican-jobs-2020 20
13 git-pull 19
14 tweet-transcriber 18
15 Abosar 13
16 Python-Web-Scraper 13
17 varieteebot 3
18 gli99 3
19 iw-scraper 1
20 nanoscrape 0

Sponsored
Save 47% on cloud hosting with autoscaling that just works
Judoscale integrates with Django, FastAPI, Celery, and RQ to make autoscaling easy and reliable. Save big, and say goodbye to request timeouts and backed-up task queues.
judoscale.com

Did you know that Python is
the 2nd most popular programming language
based on number of references?