First page Back Continue Last page Overview Graphics
How to manage nodes and users (3)
Once a day e-mail report: daily_watch; home grewn, Perl scripts. Reports on e.g.:
- Today's backup OK/in error/not done/not ready
- New memory/disk errors (DRAM Error Reporting/smartctl)
- File systems unmounted/with wrong permissions/out of space
- Server processes not running (Maui/PBS/NTPD)
- Vital configuration files missing
- Load average on node higher than twice the number of CPUs
- Compute node lost or not used by PBS