Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →
Skyscraper Alternatives
Similar projects and alternatives to skyscraper
-
Playwright
Playwright is a framework for Web Testing and Automation. It allows testing Chromium, Firefox and WebKit with a single API.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
ChromeController
Comprehensive wrapper and execution manager for the Chrome browser using the Chrome Debugging Protocol.
-
backup-scripts
The various scripts I use to back up my home computers using ssh and rsync (by eamonnsullivan)
-
bootleg
Simple template processing command line tool to help build static websites (by retrogradeorbit)
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
skyscraper reviews and mentions
-
Web Scraping in Python – The Complete Guide
Yes!
My Clojure scraping framework [0] facilitates that kind of workflow, and I’ve been using it to scrape/restructure massive sites (millions of pages). I guess I’m going to write a blog post about scraping with it at scale. Although it doesn’t really scale much above that – it’s meant for single-machine loads at the moment – it could be enhanced to support that kind of workflow rather easily.
[0]: https://github.com/nathell/skyscraper
-
Babashka: GraalVM Helped Create a Scripting Environment for Clojure
I plan to port my scraping framework (Skyscraper, https://github.com/nathell/skyscraper) to babashka one day. I’m not sure how easy it will be, though, since it uses core.async (which I believe bb has limited support for) and SQLite via clojure.java.jdbc.
-
Mastering Web Scraping in Python: Crawling from Scratch
I’ve done a fair share of scraping, and I learned that on a large scale, there are a lot of cross-cutting repetitive concerns. Things like caching, fetching HTML (preferably in parallel), throttling, retries, navigation, emitting the output as a dataset…
My library, Skyscraper [0], attempts to help with these. It’s written in Clojure (based on Enlive or Reaver, both counterparts to Beautiful Soup), but the principles should be readily transferable everywhere.
[0]: https://github.com/nathell/skyscraper
-
A note from our sponsor - InfluxDB
www.influxdata.com | 7 May 2024
Stats
The primary programming language of skyscraper is Clojure.
Sponsored