Our great sponsors
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
bootleg
Simple template processing command line tool to help build static websites (by retrogradeorbit)
-
backup-scripts
The various scripts I use to back up my home computers using ssh and rsync (by eamonnsullivan)
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
Not OP but I use Reaver with good results. It supports all of JSoup's selectors, and makes it very clean to extract data from HTML.
The documentation is a little lacking though, I had to look up other examples on GitHub to figure out how to use all the features.
https://github.com/mischov/reaver
babashka supports html parsing through pods:
https://github.com/babashka/pod-registry
Pods can be written in any language and they can expose functions to babashka by implementing a protocol.
One pod exposing HTML parsing is:
https://github.com/retrogradeorbit/bootleg
Here is an example of how to use that:
https://github.com/babashka/pod-registry/blob/master/example...
babashka supports html parsing through pods:
https://github.com/babashka/pod-registry
Pods can be written in any language and they can expose functions to babashka by implementing a protocol.
One pod exposing HTML parsing is:
https://github.com/retrogradeorbit/bootleg
Here is an example of how to use that:
https://github.com/babashka/pod-registry/blob/master/example...
https://github.com/clj-commons/hickory
I'm a previous BeautifulSoup user and have found the combination of (1) having the scraped data presented in plain Clojure data structures, and (2) Hickory's built in selectors, to be a very nice experience.
Happy scraping!
I plan to port my scraping framework (Skyscraper, https://github.com/nathell/skyscraper) to babashka one day. I’m not sure how easy it will be, though, since it uses core.async (which I believe bb has limited support for) and SQLite via clojure.java.jdbc.
As other people have said Bootleg + Hickory, here is an, admitedly not very clean, example[0] that grabs stream urls from hltv.org.
Also a basic RSS reader using the clojure XML lib[1]
[0] https://github.com/TimDeve/.dotfiles/blob/master/scripts/gen...
I used this for my back up system: https://github.com/eamonnsullivan/backup-scripts
The server runs on a Raspberry Pi with a 1-2TB USB disk attached.
No, we have to build a binary, which starts up super quickly.
I began to put together a "distribution" of useful CL libraries for everyday tasks: https://github.com/ciel-lang/CIEL/ It comes as:
- a lisp core, which you can use in your editor setup instead of sbcl or ccl, the advantage is that it loads instantly with all these libraries built-in (instead of quickloading all of them when needed)
BTW, Roswell makes it easier to run scripts: https://github.com/roswell/roswell/wiki/Roswell-as-a-Scripti...
It is also a tool to install various CL implementations, and to install software.
It doesn't come with a choice of built-in libraries.