SaaSHub helps you find the best software and product alternatives Learn more →
Top 22 Scrape Open-Source Projects
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
metascraper
Get unified metadata from websites using Open Graph, Microdata, RDFa, Twitter Cards, JSON-LD, HTML, and more.
-
Scweet
A simple and unlimited twitter scraper : scrape tweets, likes, retweets, following, followers, user info, images...
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
goq
A declarative struct-tag-based HTML unmarshaling or scraping package for Go built on top of the goquery library
-
visdom
A library use jQuery like API for html parsing & node selecting & node mutation, suitable for web scraping and html confusion.
-
FONTS_DOT_COM_RIPPER
Script to extract entire font families from Fonts.com, rips them as woff2 and final output includes woff2 and ttf files
-
dozent
Dozent is a powerful downloader that is used to collect large amounts of Twitter data from the internet archive.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Project mention: Show HN: I made a tool to clean and convert any webpage to Markdown | news.ycombinator.com | 2024-04-14
Project mention: Reverse Engineering Twitter Spaces - Capture 500 Audio Streams/Live Transcripts per IP | /r/programming | 2023-06-11
Project mention: Twitter api reaching rate limit. 5calls per 15 mins just to get user likes. | /r/learnprogramming | 2023-05-22hmm,, do you know any good one? I found this one but it doesn't scrape a single tweet's likes and followers https://github.com/Altimis/Scweet
Project mention: Squirm - This was the night of the crawling terror! | /r/crystal_programming | 2023-05-06
Hi guys, I've created an open-source low-code Node.js web scraping tool on top of the Puppeteer - https://github.com/miroshnikov/scrapyteer. It offers a small set of functions that are combined in pipelines to define a crawling workflow and a shape of output data. Maybe somebody will find it useful.
Project mention: Git scraping: track changes over time by scraping to a Git repository | news.ycombinator.com | 2023-08-10I've been promoting this idea for a few years now, and I've seen an increasing number of people put it into action.
A fun way to track how people are using this is with the git-scraping topic on GitHub:
https://github.com/topics/git-scraping?o=desc&s=updated
That page orders repos tagged git-scraping by most-recently-updated, which shows which scrapers have run most recently.
As I write this, just in the last minute repos that updated include:
https://github.com/drzax/queensland-traffic-conditions
https://github.com/jasoncartwright/bbcrss
https://github.com/jackharrhy/metrobus-timetrack-history
https://github.com/outages/bchydro-outages
Scrape related posts
- Show HN: AboutIdeasNow – search /about, /ideas, /now pages of 7k+ personal sites
- Streamate closed my account
- Reverse Engineering Twitter Spaces - Capture 500 Audio Streams/Live Transcripts per IP
- Show HN: Twitter Spy Tools – Capture large volumes of audio and transcript data
- Twitter Spy Tools - Capture large volumes of audio and transcript data
- Veliko berem, da če nimaš službe, dobiš takoj zastonj občinsko stanovanje, kjer ni treba plačevati elektrike itd. Jaz bi tudi to naredila. Mi poveste, kako vsi to dobite, sklepam da je zelo lahko in vsak to dobi?
- Twitter will be purging accounts with no activity for several years soon. We need to archive as many as we can. Any ideas on Methods
-
A note from our sponsor - SaaSHub
www.saashub.com | 26 Apr 2024
Index
What are some of the best open-source Scrape projects? This list will help you:
Project | Stars | |
---|---|---|
1 | autoscraper | 5,937 |
2 | cloudflare-scrape | 3,291 |
3 | metascraper | 2,230 |
4 | twitter-api-client | 1,334 |
5 | Scweet | 966 |
6 | stweet | 568 |
7 | scrape | 326 |
8 | goq | 251 |
9 | raise | 155 |
10 | html2rss | 111 |
11 | visdom | 102 |
12 | extract-css-core | 36 |
13 | imgur-scraper | 35 |
14 | squirm | 31 |
15 | FONTS_DOT_COM_RIPPER | 23 |
16 | scrapyteer | 16 |
17 | Blind-App-Reviews | 12 |
18 | airbnb-scraper | 9 |
19 | dozent | 7 |
20 | bchydro-outages | 5 |
21 | weheartpy | 4 |
22 | real_estate_hungary | 3 |
Sponsored