Systems  
Status displays
System status
Retired systems «
 
 
 
 
 
 
 
 
 
 
 
 

The Maui Scheduler

Overview

The Maui Scheduler is an advanced job scheduler for cluster systems. It allows site administrators extensive control over which jobs are considered eligible for for scheduling, how the jobs are prioritized, and where these jobs are run. Maui supports advance reservations, QoS levels, backfill, and allocation management.

Its scheduling scheme is based on advanced wall-time reservations with backfill. The main difference from other common batch queue schedulers (e.g. NQS, DQS) is that Maui allows jobs to overtake a job with higher priority only if it does not delay the start of the prioritized job (i.e. backfill).

Opportunistic Scheduling versus Advance Reservation

In theory, unless there is a way of reserving resources in advance, any job that need more than a single piece of allocatable resource, runs the risk of starvation[1,2]. In practice, due to limited workload this is a danger only for jobs that require extensive resources. It is often intervened by periodically putting small, less demanding jobs on hold until the large, starving job have started. This is a rather intrusive operation since it also affects jobs that is not involved in the starvation.

An advance reservation scheme, as in the Maui scheduler, makes it possible to allocate resources in the future. (Compare scheduling a meeting with several participants. It is almost impossible unless there is a calendar available.)

The starvation can also be resolved by using a preemptive job scheduler. Unfortunately not all computer systems can handle this. Moreover, if jobs can be preempted it is more difficult to predict when the job will finish.

Queues versus Quality of Service

When an idle job becomes eligible to run, it is assigned a priority. This priority is used to sort the jobs before the scheduler selects a job to start.

Many batch systems use queues to divide and classify the workload. Each queue is then assigned a priority and sometimes each job is assigned a second priority to sort themselves within the queue. This classifying scheme is often too coarse. To take into account all parameters that set a batch job policy, you may end up with more queues than jobs.

In Maui, "queues" have lost their importance in classification and priority calculations. Instead a Quality-of-Service (QoS) attribute can be used to classify the jobs. However, QoS is not a hierarchical scheme. It is merely a method of setting the parameters of a job when it enters the scheduler. All jobs eligible to run remain in one common idle-queue and their priorities are compared with all others.

Job State

Jobs in Maui can be in one of three major states:

Running
A jobs that have been alloted its required resources and have started its computation is considered running until it finish.
Queued (idle)
Jobs that are eligible to run. The priority is calculated here and the jobs are sorted according to calculated priority. Advance reservations are made starting with the job up front.
Non-queued
Jobs that, for some reason, are not allowed to start. Jobs in this state does not gain any queue-time priority.

There is a limit on the number of jobs a group/user can have in the Queued state. This prohibit users from acquiring longer queue-time than deserved by submitting large number of jobs.

Job Priority

Maui present numerous factors in the expression used to calculate the job priority to achieve a site's goals of fairness and utilization. Each factor is weighted according to its importance and the sum is used as the total priority of the job. The most important factors are described below together with the importance they have in the current configuration of Maui on Ingvar

Resource

The resource factor consist of several terms that describes the required resource to run the job; number of processors, amount of memory, size of empty disk space, and swap size. Depending on what type of jobs is favored, jobs can be pushed the front of the queue. Experience shows that favoring large jobs often improves system utilization.

Ingvar: Fairly low rating. A high utilization is desired but the fairness between users should not be affected.

Queue Time

This factor is based on the time the job has been eligible to run. This factor often has a very low weight in the priority calculation. Instead, more important is the expansion factor

Ingvar: Low rating. A fall-back.

Expansion

The expansion factor or XFactor is calculated using the equation:

  XFactor = (Queue_Time + Job_Time_Limit) / Job_Time_Limit
This relates the job time limit the user request to the total queueing and expected run time. A job with low time limit will increase its priority more quickly than a long job, pushing it to the front of the queue.

Ingvar: The most important factor after QoS. It verbalizes the general job scheduling policy.

Target

If the expansion factor is not enough to meet the scheduling goals, there is a Target factor that is increased exponentially as the actual queue time approach the target queue time.

Ingvar: Not used ...yet.

Fair Share

The fair share value is based on historical usage. It is divided into the user, group, and account associated with the job. Fair Share is a provocative factor. Although the intention is good, the effect of this factor is not easy to understand and rate to achieve fairness[3].

Ingvar: Excluded from any priority calculation.

Quality of Service

The QoS factor is a fixed number used to offset jobs with high quality-of-service.

Ingvar: Three different QoS exist; Normal, Bonus, and Disabled. Normal has a ten times higher QoS-factor than Bonus, always pushing bonus jobs to the back of the queue. Another feature in Maui prohibit Disabled jobs to make any reservations.

Reservation
There are two types of reservations in Maui:
Job Reservation
Every scheduling cycle, after the job priority have been calculated, Maui examines the jobs in the queued state and schedules advance reservations.
User Reservation
If a number of nodes is needed for a certain purpose on a certain time, it is possible, for the Maui administrator, to make an user reservation. This reservation is permanent and kept between scheduling cycles.

Also, there is standing reservations. This is user reservations which are scheduled automatically and repeatedly.

Links
http://supercluster.org
http://www.openpbs.org
User Commands

The following commands are available for users:

showq
Shows information about 1) running, 2) idle, and 3) non-queued jobs. Non-queued jobs are jobs that have been submitted to the batch queue system but are not considered eligible to run. These jobs does not gain any queued time priority.
showbf
Shows what resources are available for immediate use.
showstart <jobid>
Shows the earliest time the queued job <jobid> is eligible to start.
checkjob <jobid>
Shows various details about a submitted job. Users are only permitted to see details of jobs they own.

References
[1] M. Ben-Ari, Principles of Concurrent Programming, Prentice-Hall International (1982)
[2] Edward G. Coffman, Jr, Peter J. Denning, Operating System Theory, Prentice-Hall (1973)
[3] Richard Klamann Opportunity Scheduling: An Unfair CPU Scheduler for UNICOS Cray User Group CD-ROM (1997)
Support

If you have any questions, corrections, additions, or suggestions regarding Maui or this web-page, please contact NSC's helpdesk; support@nsc.liu.se.



Niclas Andersson





Page last modified: 2006-03-27 13:50
For more information contact us at info@nsc.liu.se.