Scrape

Top 22 Scrape Open-Source Projects

  • autoscraper

    A Smart, Automatic, Fast and Lightweight Web Scraper for Python

  • cloudflare-scrape

    A Python module to bypass Cloudflare's anti-bot page.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • metascraper

    Get unified metadata from websites using Open Graph, Microdata, RDFa, Twitter Cards, JSON-LD, HTML, and more.

  • Project mention: Show HN: I made a tool to clean and convert any webpage to Markdown | news.ycombinator.com | 2024-04-14
  • twitter-api-client

    Implementation of X/Twitter v1, v2, and GraphQL APIs (by trevorhobenshield)

  • Project mention: Reverse Engineering Twitter Spaces - Capture 500 Audio Streams/Live Transcripts per IP | /r/programming | 2023-06-11
  • Scweet

    A simple and unlimited twitter scraper : scrape tweets, likes, retweets, following, followers, user info, images...

  • Project mention: Twitter api reaching rate limit. 5calls per 15 mins just to get user likes. | /r/learnprogramming | 2023-05-22

    hmm,, do you know any good one? I found this one but it doesn't scrape a single tweet's likes and followers https://github.com/Altimis/Scweet

  • stweet

    Advanced python library to scrap Twitter (tweets, users) from unofficial API

  • Project mention: Failed using the new twitter API or alternatives | /r/learnpython | 2023-05-11
  • scrape

    Scrape any website, article or RSS/Atom Feed with ease!

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • goq

    A declarative struct-tag-based HTML unmarshaling or scraping package for Go built on top of the goquery library

  • raise

    A simple (and unofficial) GitHub Trending client that lives in your menubar.

  • html2rss

    📰 Build RSS 2.0 feeds from websites (and JSON APIs) with a few CSS selectors.

  • visdom

    A library use jQuery like API for html parsing & node selecting & node mutation, suitable for web scraping and html confusion.

  • extract-css-core

    Extract all CSS from a given url, both server side and client side rendered.

  • imgur-scraper

    Retrieve years of imgur.com's data without any authentication.

  • squirm

    This was the night of the crawling terror!

  • Project mention: Squirm - This was the night of the crawling terror! | /r/crystal_programming | 2023-05-06
  • FONTS_DOT_COM_RIPPER

    Script to extract entire font families from Fonts.com, rips them as woff2 and final output includes woff2 and ttf files

  • scrapyteer

    Web crawling & scraping framework for Node.js on top of headless Chrome browser

  • Project mention: Low-code Node.js web scraping tool | /r/webscraping | 2023-07-07

    Hi guys, I've created an open-source low-code Node.js web scraping tool on top of the Puppeteer - https://github.com/miroshnikov/scrapyteer. It offers a small set of functions that are combined in pipelines to define a crawling workflow and a shape of output data. Maybe somebody will find it useful.

  • Blind-App-Reviews

    Scraped reviews of over 25 companies from the Blind App ⚡️

  • airbnb-scraper

    Apify public actor for scraping Airbnb homes.

  • dozent

    Dozent is a powerful downloader that is used to collect large amounts of Twitter data from the internet archive.

  • bchydro-outages

    Track BCHydro Outages via Git history

  • Project mention: Git scraping: track changes over time by scraping to a Git repository | news.ycombinator.com | 2023-08-10

    I've been promoting this idea for a few years now, and I've seen an increasing number of people put it into action.

    A fun way to track how people are using this is with the git-scraping topic on GitHub:

    https://github.com/topics/git-scraping?o=desc&s=updated

    That page orders repos tagged git-scraping by most-recently-updated, which shows which scrapers have run most recently.

    As I write this, just in the last minute repos that updated include:

    https://github.com/drzax/queensland-traffic-conditions

    https://github.com/jasoncartwright/bbcrss

    https://github.com/jackharrhy/metrobus-timetrack-history

    https://github.com/outages/bchydro-outages

  • weheartpy

    A fast, reliable API wrapper for weheartit.com

  • real_estate_hungary

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Scrape related posts

Index

What are some of the best open-source Scrape projects? This list will help you:

Project Stars
1 autoscraper 5,937
2 cloudflare-scrape 3,291
3 metascraper 2,230
4 twitter-api-client 1,334
5 Scweet 966
6 stweet 568
7 scrape 326
8 goq 251
9 raise 155
10 html2rss 111
11 visdom 102
12 extract-css-core 36
13 imgur-scraper 35
14 squirm 31
15 FONTS_DOT_COM_RIPPER 23
16 scrapyteer 16
17 Blind-App-Reviews 12
18 airbnb-scraper 9
19 dozent 7
20 bchydro-outages 5
21 weheartpy 4
22 real_estate_hungary 3

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com