Our great sponsors
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
the general consensus is that pam_slurm_adopt is the better module (that's just one dude's opinion but his citations are good) - the advantage is that not only will it gatekeep SSH access, it'll also drop their SSH session into the cgroups that are constraining the user's resource limits, which also means their CPU usage will show up in sacct for the job (if the user has multiple jobs running on a node their ssh session may get dropped into the wrong one, no help for that)
As a result, you need to run a pre-check on your task to see if the node is actually suitable for your task. There are systems for node health checks, NHC is a good one, but reading the load (w) and memory pressure (free) go a long way. Clearly, this is less helpful for long running tasks.