Python Crawling

Open-source Python projects categorized as Crawling | Edit details

Top 6 Python Crawling Projects

  • GitHub repo Scrapy

    Scrapy, a fast high-level web crawling & scraping framework for Python.

    Project mention: Konohagakure Search | dev.to | 2022-01-09

    Scrapy

  • GitHub repo newspaper

    News, full-text, and article metadata extraction in Python 3. Advanced docs:

    Project mention: Is there a web text extraction library for reader mode written in Java/Kotlin? | reddit.com/r/webdev | 2021-12-20

    I have searched the web but the library I have found was for Python only. I need a library written in Java or Kotlin so that I could use it on Android. Is there any library for that? If you know that there is no such Java library, please let me know that so that I could stop searching.

  • SonarQube

    Static code analysis for 29 languages.. Your projects are multi-language. So is SonarQube analysis. Find Bugs, Vulnerabilities, Security Hotspots, and Code Smells so you can release quality code every time. Get started analyzing your projects today for free.

  • GitHub repo isp-data-pollution

    ISP Data Pollution to Protect Private Browsing History with Obfuscation

    Project mention: YSK: “Data Pollution” is a technique for polluting your search history with random searches to keep ISPs, big data companies, and governments from gathering meaningful data about you | reddit.com/r/YouShouldKnow | 2021-05-31

    Here’s one such project tackling this: https://github.com/essandess/isp-data-pollution/

  • GitHub repo spidermon

    Scrapy Extension for monitoring spiders execution.

    Project mention: spidermon: Scrapy Extension for monitoring spiders execution | news.ycombinator.com | 2021-02-16
  • GitHub repo mlscraper

    🤖 Scrape data from HTML websites automatically with Machine Learning

    Project mention: mlscraper: Scrape data from HTML pages automatically with Machine Learning | news.ycombinator.com | 2021-07-05
  • GitHub repo spidy Web Crawler

    The simple, easy to use command line web crawler.

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2022-01-09.

Python Crawling related posts

Index

What are some of the best open-source Crawling projects in Python? This list will help you:

Project Stars
1 Scrapy 42,525
2 newspaper 11,599
3 isp-data-pollution 486
4 spidermon 385
5 mlscraper 353
6 spidy Web Crawler 276
Find remote jobs at our new job board 99remotejobs.com. There are 29 new remote jobs listed recently.
Are you hiring? Post a new remote job listing for free.
Less time debugging, more time building
Scout APM allows you to find and fix performance issues with no hassle. Now with error monitoring and external services monitoring, Scout is a developer's best friend when it comes to application development.
scoutapm.com