![]() | ||
|
XLotto Home PageIntroductionXLotto is a distributed X window tool for monitoring batch queues, processes, and PE usage on a Cray T3E computer. With XLotto, the need to log into the T3E only for checking the status diminish. The client part is written in Tcl/Tk and is easily installed on any UNIX box. The gathering program, getlotto, is written in C and runs on the T3E. RequirementsTo run XLotto you need
DownloadDownload the latest XLotto from ftp://ftp.nsc.liu.se/pub/tools/xlotto/xlotto.InstallingIf you have the Tcl/Tk windowing shell, wish, in your path, you can run xlotto as it is. Just change the execute mode bits:
chmod +x xlotto
./xlotto
If you do not have wish in your path, you must modify the first line in xlotto and insert the path to the wish binary on your computer. Tcl/Tk can be downloaded from http://www.scriptics.com/. An alternative is if you have at least the client part of Cray NQE installed on your computer. (Downloadable from ftp://ftp.cray.com/pub/nqe/software/.) In that case, change the first line of xlotto to
#!/bin/sh
and xlotto will, by itself, find nqe_wish.
How it WorksOn the T3E, getlotto, gather status information from various actors and daemons. It forwards the status to a server. Periodically, XLotto contacts the server and downloads the status and updates the display. If XLotto is executed on the T3E, no intermediate server is used. The type of channel (ports, servers, programs) used for distributing the status could actually be anything. The software directly supports a configuration where the status is rcp:d from the T3E to another host and then downloaded by the clients using HTTP. Since the format of the status is a simple ASCII file, other distribution methods (e.g FTP), are easy to adapt. Moreover, the status information is easy to parse and extend, enabling the creation of other tools. The LookThis section is based on xlotto.2.0Example: XLotto screendump (112 Kb)
XLotto's display consists of four parts: Menubar
The following menu options exists:
The sort order option as received results in the same order NQS consider jobs for execution (I think...) Lotto View
Each processing element (PE) in the T3E is represented by a small widget. The color of the widget represent the type of PE:
The digit inside represent the number of processes running on the PE (according to the Global Resource Manager on the T3E). Shells, daemons and other small processes are ignored. Whenever the mouse is placed over a PE, its logical number (in both decimal and hex), available memory to user, and free memory are displayed in the status line. Job View (merged)
This view contains information about the jobs on the T3E. Each black line (black refers to the color of the characters) represents a batch (NQS) job while each blue line represents a process (serial or parallel).
Idle batch jobs occupies one line in the view while active (running) batch jobs can have one, two or more lines depending on how many processes it has. Interactive jobs has, of course, no batch information to display. Example:
Whenever the mouse is placed over a job, its status and sub-status is explained in the status line. Whenever the mouse is placed over a process, its memory size and start time will be displayed in the status line. Also, the PEs that are used are highlighted in the lottoview. Sometimes processes are queued in the GRM due to lack of resources. A common situation is jobs that requires large memory PEs. (NQS has no knowledge about PE memory requirements.) When a process is placed in the GRM waiting queue, the reason is highlighted. Example:Other GRM messages can also be displayed,
e.g.
Another situation which calls administrator's attention is when a user requests more resources that he/she use. When the resource in question is the number of PEs, XLotto highlight the PE fields of the job: Example:
For more information about various NQS attributes, see the Cray manual NQE User's Guide, SG-2148. Status Line
The status line is divided in two parts. The left part shows various status and information messages. The right part keeps track of when the last sample of status was taken in the T3E. If the database, for some reason, is not updated regularly, the display becomes brownish. Also, the title of the window is changed to DOWN. The reason may be that the T3E is down but it can also be a communication problem. To Do
AuthorXLotto was written by Niclas Andersson <nican@nsc.liu.se>, National Supercomputer Centre, Linköping University, Sweden.Niclas Andersson <nican@nsc.liu.se> Please, let me know if you have any problems, questions,
corrections, additions or
suggestions.
|