-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
If writing your own CUDA code is hard (as I think each implementation has to be architecture-specific and learning about so many architectures is just not feasible) are there any alternatives to writing CUDA that are commonly used by the community? I read about openai/triton or is there any compiler that automatically does this? Or do I have to go the long route and learn CUDA for each architecture?
cuDNN is already very optimized but if you want to read on optimizing, here you go (Maxwell specifix) https://github.com/NervanaSystems/maxas/wiki/SGEMM, there is an accompanying paper or read Nvidia Cutlass.