babashka-sql-pods
skyscraper
babashka-sql-pods | skyscraper | |
---|---|---|
2 | 3 | |
77 | 401 | |
- | - | |
4.8 | 4.9 | |
8 months ago | 10 months ago | |
Clojure | Clojure | |
Eclipse Public License 1.0 | - |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
babashka-sql-pods
- Babashka: GraalVM Helped Create a Scripting Environment for Clojure
-
GraalVM at Facebook
I've used native-image both directly and indirectly. Directly to implement a Clojure-language authentication server which needed a small memory footprint. Indirectly through @borkdude's https://github.com/babashka/babashka which provides a native-image binary that can run much of the Clojure language.
We open-sourced some babashka code at https://github.com/staticweb-io/staticweb-open-wp/tree/maste... One major caveat: when I wrote that code, babashka didn't have any MySQL support, so I shelled out to the MySQL CLI. Later, I figured out how to compile the MySQL JDBC drivers with native-image and it's now available at https://github.com/babashka/babashka-sql-pods along with HSQLDB, SQL Server, Oracle, and Postgres drivers.
skyscraper
-
Web Scraping in Python – The Complete Guide
Yes!
My Clojure scraping framework [0] facilitates that kind of workflow, and I’ve been using it to scrape/restructure massive sites (millions of pages). I guess I’m going to write a blog post about scraping with it at scale. Although it doesn’t really scale much above that – it’s meant for single-machine loads at the moment – it could be enhanced to support that kind of workflow rather easily.
[0]: https://github.com/nathell/skyscraper
-
Babashka: GraalVM Helped Create a Scripting Environment for Clojure
I plan to port my scraping framework (Skyscraper, https://github.com/nathell/skyscraper) to babashka one day. I’m not sure how easy it will be, though, since it uses core.async (which I believe bb has limited support for) and SQLite via clojure.java.jdbc.
-
Mastering Web Scraping in Python: Crawling from Scratch
I’ve done a fair share of scraping, and I learned that on a large scale, there are a lot of cross-cutting repetitive concerns. Things like caching, fetching HTML (preferably in parallel), throttling, retries, navigation, emitting the output as a dataset…
My library, Skyscraper [0], attempts to help with these. It’s written in Clojure (based on Enlive or Reaver, both counterparts to Beautiful Soup), but the principles should be readily transferable everywhere.
[0]: https://github.com/nathell/skyscraper
What are some alternatives?
nbb - Scripting in Clojure on Node.js using SCI
WebDumper - A tool for scraping, dumping and unpacking (webpacked) javascript source files.
quickdoc - Quick and minimal API doc generation for Clojure
grub-2.0 - Grub is an AI powered Web crawler.
pod-babashka-aws - Deprecated, use https://github.com/grzm/awyeah-api
ChromeController - Comprehensive wrapper and execution manager for the Chrome browser using the Chrome Debugging Protocol.
babashka-tools - A collection of Babashka tools
reaver - A Clojure library for extracting data from HTML.
babashka - A Clojure babushka for the grey areas of Bash (native fast-starting Clojure scripting environment) [Moved to: https://github.com/babashka/babashka]
hickory - HTML as data
.dotfiles - My dotfiles
colly - Elegant Scraper and Crawler Framework for Golang