Our great sponsors
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
The current offerings for reading spreadsheets all depend on Apache which has issues with spreadsheets of a certain size but by no means really large. I have been using clj-xlsxio with great success.
I really like geni: it is really idiomatic in its approach to Apache Spark. There are some gaps (no UDFs), and I am not sure that the project is as active as it used to be. But I still use it and find it very nice (I do have Apache Spark background already). tablecloth is an alternative dataframe library that is being used by a lot of folks in the Clojure data science world. For that matter, you should check out scicloj, and also hang out in the data channel in zulip.
I can't help you with the database part but I think you have enough grist for your mill, so I would use whatever db you know and then connect to it via next-jdbc
For 1: This ns of tech.ml.dataset supports reading of multiple worksheets per file https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/libs/fastexcel.clj