-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
> How has the architecture evolved to make the A100 significantly faster?
Oh, very much so. By way more than an order of magnitude. For a deeper read, have a look at the "architecture white papers" for Kepler, Pascal, Volta/Turing, and Ampere:
https://duckduckgo.com/?t=ffab&q=NVIDIA+architecture+white+p...
or check out the archive of NVIDIA's parallel4all blog ... hmm, that's weird, it seems like they've retired it. They used to have really good blog posts explaining what's new in each architecture.
You could also have a look here:
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index....
for the table of various numeric sizes and limits which change with different architectures. But that's not a very useful resource in and of itself.
For NVIDIA GPUs, Nsight systems is wildly detailed and has both GUI and CLI options: https://developer.nvidia.com/nsight-systems
For DL specifically, this article covers a couple of options that actually plug into the framework: https://developer.nvidia.com/blog/profiling-and-optimizing-d...
nvidia-smi is the core tool most folks use for quick "top"-like output, but there is also an htop equivalent: https://github.com/shunk031/nvhtop
A lot of other tools are build on top of the low-level NVML library (https://developer.nvidia.com/nvidia-management-library-nvml). There are also Python NVML bindings if you need to write your own monitoring tools.
I personally like gpustat -- it's a nvidia-smi wrapper but it has colors...
They also i guess now have a web sever plugged into it which seems pretty cool
https://github.com/wookayin/gpustat