-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
I'd give bonus points for evidence of attempts to package up models as command line scripts (text goes in, predictions come out) but I'd give out double bonus if someone were to use FastAPI or Flask to make the model accessible via API and do that in a way that is compute and memory efficient (for example have the program load the model once on startup rather than every time a request is received).
For the data cleaning and training parts, you might have projects where you've used kaggle datasets to train models and you've done appropriate feature engineering and data exploration to help you to understand whether data might need to be under or over sampled or cleaned in some other way. I'd give bonus points to someone who has thoughts about how training pipelines might be semi or fully automated in a production environment (e.g. use of scripts and tools like dvc to make things easy to reproduce. I'd want to see evidence of appropriate metrics (e.g. I know its 99% accurate and that might be great but if its a 10-way classification on a very unbalanced dataset, what can you tell me about performance on the smallest class?).