Best Data Tools for my use case

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

clj-xlsxio

1 16 10.0 Clojure

XLSXIO bidings por clojure

The current offerings for reading spreadsheets all depend on Apache which has issues with spreadsheets of a certain size but by no means really large. I have been using clj-xlsxio with great success.

tablecloth

10 264 9.1 HTML

Dataset manipulation library built on the top of tech.ml.dataset (by scicloj)

I really like geni: it is really idiomatic in its approach to Apache Spark. There are some gaps (no UDFs), and I am not sure that the project is as active as it used to be. But I still use it and find it very nice (I do have Apache Spark background already). tablecloth is an alternative dataframe library that is being used by a lot of folks in the Clojure data science world. For that matter, you should check out scicloj, and also hang out in the data channel in zulip.

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
next-jdbc

3 727 7.8 Clojure

A modern low-level Clojure wrapper for JDBC-based access to databases.

I can't help you with the database part but I think you have enough grist for your mill, so I would use whatever db you know and then connect to it via next-jdbc

tech.ml.dataset

15 633 8.8 Clojure

A Clojure high performance data processing system

For 1: This ns of tech.ml.dataset supports reading of multiple worksheets per file https://github.com/techascent/tech.ml.dataset/blob/master/src/tech/v3/libs/fastexcel.clj

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project