Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →
Top 23 Webscraping Open-Source Projects
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
browser-fingerprinting
Analysis of Bot Protection systems with available countermeasures 🚿. How to defeat anti-bot system 👻 and get around browser fingerprinting scripts 🕵️♂️ when scraping the web?
-
webscraping-from-0-to-hero
The web scraping open project repository aims to share knowledge and experiences about web scraping with Python
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
CrossLinked
LinkedIn enumeration tool to extract valid employee names from an organization through search engine scraping
-
xidel
Command line tool to download and extract data from HTML/XML pages or JSON-APIs, using CSS, XPath 3.0, XQuery 3.0, JSONiq or pattern matching. It can also create new or transformed XML/HTML/JSON documents.
-
NYTimes-App
🗽 A Simple Demonstration of the New York Times App 📱 using Jsoup web crawler with MVVM Architecture 🔥
-
dude
dude uncomplicated data extraction: A simple framework for writing web scrapers using Python decorators
-
r-web-scraping-cheat-sheet
Guide, reference and cheatsheet on web scraping using rvest, httr and Rselenium.
-
EasyApplyJobsBot
A python bot to automatically apply all Linkedin,Glassdoor, etc Easy Apply jobs based on your preferences. Auto login, auto fill additional questions, apply automatically!
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Project mention: Create agents that monitor and act on your behalf | news.ycombinator.com | 2024-03-24
Project mention: A site that tracks the price of a Big Mac in every US McDonald's | news.ycombinator.com | 2024-01-13Yes, there is a lot written about it. Here is one link I have saved:
https://github.com/niespodd/browser-fingerprinting
Project mention: Web Scraping from 0 to hero – Sharing knowledge about web scraping on GH | news.ycombinator.com | 2023-07-06
You could try Xidel[1]. It supports JSON, XML and HTML using XPath/XQuery 3.1
It has some extensions to the standard that are pretty nice (JSONiq, CSS selectors, html “template” matching), but you can limit it to just standard XPath/XQuery if you like.
I recommend getting the nightly v .99 build if you give it a try, the stable .98 version is pretty old and I’ve had no issues with .99
1. https://www.videlibri.de/xidel.html
Project mention: Control the browser using GPT-4 vision by AgentGPT team | news.ycombinator.com | 2023-11-12
Project mention: Webscraping beginner here ready to start leveling up to intermediate. Looking for some good webscraping repositories (e.g any of your GitHub repos/projects) that I can use as learning tools, and general recommendations for what to do next | /r/webscraping | 2023-05-08Please check https://github.com/roniemartinez/dude
Webscraping related posts
- Create agents that monitor and act on your behalf
- How To Scrape TikTok in 2024
- Direction Of The Stock Market
- And I thought amazing fics suddenly being deleted was a myth
- Control the browser using GPT-4 vision by AgentGPT team
- Show HN: Open-Source Desktop AI Webscraper
- ThreatMetrix (anti-bot/fraud-detection) solver, deobfuscator & data harvester
-
A note from our sponsor - InfluxDB
www.influxdata.com | 23 Apr 2024
Index
What are some of the best open-source Webscraping projects? This list will help you:
Project | Stars | |
---|---|---|
1 | Huginn | 41,441 |
2 | ani-cli | 6,577 |
3 | awesome-web-scraping | 6,299 |
4 | autoscraper | 5,937 |
5 | browser-fingerprinting | 3,830 |
6 | soup | 2,125 |
7 | webscraping-from-0-to-hero | 1,453 |
8 | scrapeghost | 1,390 |
9 | requests-cache | 1,254 |
10 | CrossLinked | 1,140 |
11 | gazpacho | 730 |
12 | xidel | 650 |
13 | NYTimes-App | 507 |
14 | tarsier | 486 |
15 | morph | 463 |
16 | dude | 413 |
17 | mov-cli | 379 |
18 | r-web-scraping-cheat-sheet | 378 |
19 | Rcrawler | 344 |
20 | TikTokBot | 341 |
21 | polite | 322 |
22 | EasyApplyJobsBot | 317 |
23 | zimit | 228 |
Sponsored