Webscraping

Top 23 Webscraping Open-Source Projects

  • Huginn

    Create agents that monitor and act on your behalf. Your agents are standing by!

  • Project mention: Create agents that monitor and act on your behalf | news.ycombinator.com | 2024-03-24
  • ani-cli

    A cli tool to browse and play anime

  • Project mention: Rule | /r/196 | 2023-05-18
  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • awesome-web-scraping

    List of libraries, tools and APIs for web scraping and data processing.

  • autoscraper

    A Smart, Automatic, Fast and Lightweight Web Scraper for Python

  • browser-fingerprinting

    Analysis of Bot Protection systems with available countermeasures 🚿. How to defeat anti-bot system 👻 and get around browser fingerprinting scripts 🕵️‍♂️ when scraping the web?

  • Project mention: A site that tracks the price of a Big Mac in every US McDonald's | news.ycombinator.com | 2024-01-13

    Yes, there is a lot written about it. Here is one link I have saved:

    https://github.com/niespodd/browser-fingerprinting

  • soup

    Web Scraper in Go, similar to BeautifulSoup

  • webscraping-from-0-to-hero

    The web scraping open project repository aims to share knowledge and experiences about web scraping with Python

  • Project mention: Web Scraping from 0 to hero – Sharing knowledge about web scraping on GH | news.ycombinator.com | 2023-07-06
  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • scrapeghost

    👻 Experimental library for scraping websites using OpenAI's GPT API.

  • requests-cache

    Transparent persistent cache for python requests

  • CrossLinked

    LinkedIn enumeration tool to extract valid employee names from an organization through search engine scraping

  • gazpacho

    🥫 The simple, fast, and modern web scraping library

  • xidel

    Command line tool to download and extract data from HTML/XML pages or JSON-APIs, using CSS, XPath 3.0, XQuery 3.0, JSONiq or pattern matching. It can also create new or transformed XML/HTML/JSON documents.

  • Project mention: Move over jq I found something easier: fx | news.ycombinator.com | 2023-06-06

    You could try Xidel[1]. It supports JSON, XML and HTML using XPath/XQuery 3.1

    It has some extensions to the standard that are pretty nice (JSONiq, CSS selectors, html “template” matching), but you can limit it to just standard XPath/XQuery if you like.

    I recommend getting the nightly v .99 build if you give it a try, the stable .98 version is pretty old and I’ve had no issues with .99

    1. https://www.videlibri.de/xidel.html

  • NYTimes-App

    🗽 A Simple Demonstration of the New York Times App 📱 using Jsoup web crawler with MVVM Architecture 🔥

  • tarsier

    Vision utilities for web interaction agents 👀

  • Project mention: Control the browser using GPT-4 vision by AgentGPT team | news.ycombinator.com | 2023-11-12
  • morph

    Take the hassle out of web scraping (by openaustralia)

  • dude

    dude uncomplicated data extraction: A simple framework for writing web scrapers using Python decorators

  • Project mention: Webscraping beginner here ready to start leveling up to intermediate. Looking for some good webscraping repositories (e.g any of your GitHub repos/projects) that I can use as learning tools, and general recommendations for what to do next | /r/webscraping | 2023-05-08

    Please check https://github.com/roniemartinez/dude

  • mov-cli

    Watch everything from your terminal.

  • r-web-scraping-cheat-sheet

    Guide, reference and cheatsheet on web scraping using rvest, httr and Rselenium.

  • Rcrawler

    An R web crawler and scraper

  • TikTokBot

    A TikTokBot that downloads trending tiktok videos and compiles them using FFmpeg

  • polite

    Be nice on the web

  • EasyApplyJobsBot

    A python bot to automatically apply all Linkedin,Glassdoor, etc Easy Apply jobs based on your preferences. Auto login, auto fill additional questions, apply automatically!

  • Project mention: Experiência dos candidatos numa vaga Sênior | /r/brdev | 2023-05-08
  • zimit

    Make a ZIM file from any Web site and surf offline!

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Webscraping related posts

Index

What are some of the best open-source Webscraping projects? This list will help you:

Project Stars
1 Huginn 41,441
2 ani-cli 6,577
3 awesome-web-scraping 6,299
4 autoscraper 5,937
5 browser-fingerprinting 3,830
6 soup 2,125
7 webscraping-from-0-to-hero 1,453
8 scrapeghost 1,390
9 requests-cache 1,254
10 CrossLinked 1,140
11 gazpacho 730
12 xidel 650
13 NYTimes-App 507
14 tarsier 486
15 morph 463
16 dude 413
17 mov-cli 379
18 r-web-scraping-cheat-sheet 378
19 Rcrawler 344
20 TikTokBot 341
21 polite 322
22 EasyApplyJobsBot 317
23 zimit 228

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com