Support  
Apply for time »
User guides
Parallel computing
E-mail lists
 
 
 
 
 
 
 
 
 
 
 

XLotto Home Page


Introduction

XLotto is a distributed X window tool for monitoring batch queues, processes, and PE usage on a Cray T3E computer. With XLotto, the need to log into the T3E only for checking the status diminish.

The client part is written in Tcl/Tk and is easily installed on any UNIX box.

The gathering program, getlotto, is written in C and runs on the T3E.


Requirements

To run XLotto you need
  • A UNIX box.
  • Tcl/Tk window shell (wish). Version of Tk >= 4.0.
    Get the latest version from http://sunscript.sun.com/.
  • Connection to Internet. The status is periodically downloaded from a server.
  • A colour monitor is preferred.
  • ... and some more.

Download

Download the latest XLotto from ftp://ftp.nsc.liu.se/pub/tools/xlotto/xlotto.

Installing

If you have the Tcl/Tk windowing shell, wish, in your path, you can run xlotto as it is. Just change the execute mode bits:

    chmod +x xlotto
    ./xlotto

If you do not have wish in your path, you must modify the first line in xlotto and insert the path to the wish binary on your computer.

Tcl/Tk can be downloaded from http://www.scriptics.com/.

An alternative is if you have at least the client part of Cray NQE installed on your computer. (Downloadable from ftp://ftp.cray.com/pub/nqe/software/.) In that case, change the first line of xlotto to

    #!/bin/sh
and xlotto will, by itself, find nqe_wish.

How it Works

On the T3E, getlotto, gather status information from various actors and daemons. It forwards the status to a server.

Periodically, XLotto contacts the server and downloads the status and updates the display.

If XLotto is executed on the T3E, no intermediate server is used.

The type of channel (ports, servers, programs) used for distributing the status could actually be anything. The software directly supports a configuration where the status is rcp:d from the T3E to another host and then downloaded by the clients using HTTP. Since the format of the status is a simple ASCII file, other distribution methods (e.g FTP), are easy to adapt.

Moreover, the status information is easy to parse and extend, enabling the creation of other tools.


The Look

This section is based on xlotto.2.0

Example: XLotto screendump (112 Kb)
Ooops, got a 2.2 screendump here...

XLotto's display consists of four parts:

Menubar

The following menu options exists:

File->quit (<q>) Terminate XLotto.
View->Lottoview->vertical If this option is set, the PEs in the Lotto view is arranged in columns from left to right otherwise in rows from top to bottom.
View->Lottoview->2 columns/rows
View->Lottoview->4 columns/rows
View->Lottoview->8 columns/rows (default)
View->Lottoview->16 columns/rows
View->Lottoview->32 columns/rows
Selects how many PEs should be packed in each column (if vertical is selected) or row.
View->Jobview->split The Job-view is splitted in two parts. The upper part shows running applications (as presented by the Global Resource Manager). The lower part shows batch jobs.
View->Jobview->merged (default) The information from NQS and GRM are merged into one view.
View->Sortorder->as received (default)
View->Sortorder->age
View->Sortorder->priority+age
View->Sortorder->status+age
View->Sortorder->status+priority+age
View->Sortorder->status+queue+age
View->Sortorder->status+queue+priority+age
View->Sortorder->queue+age
View->Sortorder->queue+priority+age
View->Sortorder->queue+status+age
The order the batch jobs are sorted in the job display.
(as recieved) Jobs are listed in the order they are recieved.
age Jobs are sorted numerically by NQS id. In all options this is used as the last sort criteria.
priority Jobs are sorted by NQS intra-queue priority.
status Jobs are sorted in the order specified by the X option nqs_statusorder
queue Jobs are sorted in the order specified by the X option nqe_queueorder
View->Update (<u>) Get status from database and update the view Now!
Help->About... Program name, version, and copyright information

The sort order option as received results in the same order NQS consider jobs for execution (I think...)

Lotto View

Each processing element (PE) in the T3E is represented by a small widget. The color of the widget represent the type of PE:

Application PE
Command PE
OS PE
Unconfigured or downed PE

The digit inside represent the number of processes running on the PE (according to the Global Resource Manager on the T3E). Shells, daemons and other small processes are ignored.

Whenever the mouse is placed over a PE, its logical number (in both decimal and hex), available memory to user, and free memory are displayed in the status line.

Job View (merged)

This view contains information about the jobs on the T3E. Each black line (black refers to the color of the characters) represents a batch (NQS) job while each blue line represents a process (serial or parallel).

For parallel processes, only information from the first (sub)process is shown.

Idle batch jobs occupies one line in the view while active (running) batch jobs can have one, two or more lines depending on how many processes it has.

Interactive jobs has, of course, no batch information to display.

Example:

User x_nscmos has a batch job with ID 21164 and name ODELIUS. The job resides in the queue q3h and is running (Status: R). It has a PE limit of 16 processors of which 16 are used and a MPP time limit of 2 hours 13 minutes and 20 second of which he only has consumed 53 minutes and 49 seconds.

The job has one process running with the PID 47559 and the command is cpmd.x. The binary is not labeled (-). The process occupies 32 PEs (PE 120-151) and has an execution time of 21 minutes and 14 seconds.

Whenever the mouse is placed over a job, its status and sub-status is explained in the status line.

Whenever the mouse is placed over a process, its memory size and start time will be displayed in the status line. Also, the PEs that are used are highlighted in the lottoview.

Sometimes processes are queued in the GRM due to lack of resources. A common situation is jobs that requires large memory PEs. (NQS has no knowledge about PE memory requirements.) When a process is placed in the GRM waiting queue, the reason is highlighted.

Example:

Other GRM messages can also be displayed, e.g. Launching, Migrating, ...

This is not an error situation, it is merely an indicator for T3E administrators.

Another situation which calls administrator's attention is when a user requests more resources that he/she use. When the resource in question is the number of PEs, XLotto highlight the PE fields of the job:

Example:

This may be an allowed behavior for some jobs.

For more information about various NQS attributes, see the Cray manual NQE User's Guide, SG-2148.

Status Line

The status line is divided in two parts. The left part shows various status and information messages. The right part keeps track of when the last sample of status was taken in the T3E.

If the database, for some reason, is not updated regularly, the display becomes brownish. Also, the title of the window is changed to DOWN. The reason may be that the T3E is down but it can also be a communication problem.


To Do

  • Handle user defaults better.
  • If Tcl version >= 8.0, the existing http package should be used for downloading the status when using the HTTP getmethod.
  • Create a JavaScript-based WWW page to present similar information.
  • Port it to JAVA?
  • Port it to PC and MAC. What kind of dependencies are there?

Author

XLotto was written by Niclas Andersson <nican@nsc.liu.se>, National Supercomputer Centre, Linköping University, Sweden.
Niclas Andersson <nican@nsc.liu.se>

Please, let me know if you have any problems, questions, corrections, additions or suggestions.




Page last modified: 2003-04-09 13:36
For more information contact us at info@nsc.liu.se.