-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
The easiest place to start, in my super humble opinion, is the MOTIF dataset - which is "small" in the total number of samples but is one of the only high-quality labeled malware family datasets. Then you can look at a number of problems related to malware family detection/classification, on something small enough that you can work within school project resources, and have really real labels and quality.
Your other option is the EMBER dataset, which has pre-vectorized feature vectors available to use. Unfortunately, that is also quite limiting: there isn't much you can do with the vectorized data that hasn't already been done. But it would let you work with something much bigger scale for free.