better detection of offline loginnodes
I think ssh is the best way. SSH needs a key just to login, but not to check if the node is working
Example during production with a fake key:
$ ssh -i .ssh/id_rsa juwels11.fz-juelich.de dalvarez@juwels11.fz-juelich.de: Permission denied (publickey).
Example during maintenance with a fake key:
$ ssh -i .ssh/id_rsa juwels11.fz-juelich.de
- Welcome to *
-
_ _ ___ _______ _ ____ *
-
| | | | \ \ / / ____| | / ___| Juelich Wizard *
- _ | | | | |\ \ /\ / /| | | | __ \ for *
- | |_| | |_| | \ V V / | |___| |___ ___) | European Leadership *
- ___/ ___/ _/_/ |_____|_____|____/ Science *
-
*
2020-11-19T14:00+0200
Status information JUWELS
Known issues: https://apps.fz-juelich.de/jsc/hps/juwels/known-issues.html
2021-04-15T18:00+0200
UCX as default on ParaStationMPI
During the next maintenance, the default communication library used by ParaStationMPI will change from verbs to UCX. To enable that changes already please load "mpi-settings/UCX-plain" after loading ParaStationMPI
2021-03-02T08:00+0200
RW access to $DATA is enabled again after incident 2021-01-26
Please validate your data on the recovered file system. The restored data from tape can be found at /p/largedata_restore. Please note that the retrieval is on-going and the accessible data is hence limited. The /p/largedata_restore file system is mounted read-only.
More information is communicated to data project members via e-mail.
+------------------------------------------------------------------------------+ | System is in maintenance. Sorry, not available for production. | +------------------------------------------------------------------------------+ dalvarez@juwels11.fz-juelich.de: Permission denied (publickey).
So using a fake key and a fake user allows you to check that:
the node is reachable
the node is in maintenance (or not)
There is a third case: a node is in production but in a faulty state (eg: GPFS not mounted). That should happen very rarely, and I would not bother with it.
To double check, I would suggest also to cross check the DNS RR, to see if the login you are connecting too, is indeed supposed to be user-reachable:
$ dig +short juwels.fz-juelich.de | xargs -n1 dig +short -x | sed s/'.$'//g | sort juwels02.fz-juelich.de juwels03.fz-juelich.de juwels05.fz-juelich.de juwels06.fz-juelich.de juwels07.fz-juelich.de juwels08.fz-juelich.de
Best, Damian