![]() | ||
|
Monolith FAQ
job 83609 violates active HARD MAXPS limit of 1 for user x_nscnnis that you can no longer run jobs longer than one second, i.e. you cannot run jobs on Monolith. Probably your project has ended and you do not any longer belong to any project on Monolith. You may still login to Monolith, to take home your files, but may not run jobs there.
Performance for the global NFS mounted file systems is much less than for the local file system and if too heavily used also impacts overall system performance. So, in order to improve the situation we urge you to please use /disk/local for local files. This will improve the performance of your job and relieve the system. The rules for the /disk/local file system are simple: In addition, we have modified the G98 execution scripts setting the environment variable "GAUSS_SCRDIR" to /disk/local. This means that G98 jobs are now forced to use /disk/local for scratch files (Gau-xxx.rwf, Gau-xxx.scr, Gau-xxx.int, Gau-xxx.d2e and Gau-xxx.inp).
If you have problems with scp due to an old version at the receiving end you may use scp -oProtocol=1.
/usr/local/intel/6.0_old/ /usr/local/intel/6.0/ /usr/local/intel/7.0/ /usr/local/intel/7.1/ /usr/local/pgi/4.0/ /etc/cmod/modulefilesYou switch between the Intel compilers with
As soon as you log out you will have Intel 7.1 as default when you log back in.
We have installed new versions of the Intel compilers.To use them or test them, please use the module system:module unload intel; module load intel/8.0The Fortran compiler has been profoundly changed, also in the name of the compiler and the names of option flags. The new compiler name is 'ifort'. (The old name 'ifc' may still be used.) Please look for more information about the compiler on URL = http://www.intel.com/software/products/compilers/flin/whatsnew.htm. Run-Time Error Messages are given in http://www.intel.com/software/products/compilers/flin/docs/f_ug1/ug1l_rt_errors.htm. Release notes and other documentation are located as files on Monolith:
The Intel® Fortran Compiler 8.0 allocates more temporaries on the stack than previous Intel Fortran compilers. If a program has inadequate stack space at runtime, it will terminate with a Segmentation fault or Signal 11. The stack space can be increased with the ulimit -s unlimited command (in bash shell) or limit stacksize unlimited (in csh shell) on Linux. When using ifort (Intel 8.0) and MKL for linking the Lapack routine dgeev (with ifort test_dgeev.f90 -L$MKL_ROOT -lmkl_lapack -lmkl_p4 -lpthread -lguide -openmp) you may get the following error message: "ilaenv.o(.text+0x52a): Undefined reference to 's_copy' "This is because libg2c is not linked automatically. Add the following: -L/usr/lib/gcc-lib/i386-redhat-linux/2.96/ -lg2cSee http://support.intel.com/support/performancetools/libraries/mkl/linux/link_error.htm for further information, including the reason.
New Portland Group CompilerFor the Portland compilers pgf77, pgf90, and pghpf the default limit for file size is 2 G Bytes. You can remove this limit through using the Large File Support routines by adding the switch -Mlfs to your compilation command.We have also installed a new version, 5.0-2, of the PGI compilers. To use them or test them, please use the module system: module unload pgi; module load pgi/5.0Release notes and other documentation (Fortran and C/C++) are located as files on Monolith: /usr/local/pgi/linux86/5.0/doc. With the command "module add gcc/3.3" you get the new version of g77. If you use scratch-files from Fortran we recommend that these are assigned names via FILE=filename in the OPEN statement.
You get an interactive job on one node (two processors) four one hour with the command qsub -I -lwalltime=1:00:00,nodes=1If you start a job on several nodes the environment variable PBS_NODEFILE will point to a file with all the nodes in your job. You will be automatically logged into the node when your interactive batch job starts. If nothing happens you may do "showstart" (from another window) on the batch job to see when it will start. During the day there are 8 nodes reserved for interactive jobs with a maximum time limit of 1 hour.
Octave - A high-level language for numerical computations. (Matlab like)
Grace (ACE/gr, Xmgr) - Numerical Data Processing and Visualization Tool
Gnuplot - A portable command-line driven interactive plotting utility
#PBS -l nodes=16:ppn=2Nodes on Monolith are allocated on a node basis. This means that you will be accounted for two processors on a node, even if you use only one. If you e.g. run parameter variations on a problem as one-processor jobs, it would be a good idea to run two variations in each job, thus using both processors.
We have also installed scp and sftp. With some environments you may have problems with giving the password to ssh and scp. If that is the case, you may try to terminate the password character string with "Control-J" instead of the usual "Enter" or "Return". You may use the SSH crypto challenge authentication mechanism to avoid sending a not encrypted password.
A description is available in FAQ for ScaMPI. We tested interactively with the following script qsub -I -l walltime=1:00:00 -l nodes=2:ppn=2 MPI_HOME=/opt/scali export MPI_HOME SCAMPI_TRACE='-b -v' export SCAMPI_TRACE cd mpi /opt/scali/bin/mpimon a.out -- `cat $PBS_NODEFILE`
The final list of provided mathematics libraries is yet to be determined but SCALAPACK will soon be available: If you need other libraries you can either build them yourself or ask for our assistance building and/or installing them globally on the system. Several chemistry application programs have now been installed on Monolith, including Gaussian 98 and Dalton 1.2, see the Chemistry page. For Gaussian you may use up to 850 MB with "shared memory" but only 512 MB with Linda. We also have the FLUENT 6.1 software available on Monolith, but only for users within Linköping University. FLUENT is a flow and heat transfer modeling software suited to a wide range of applications. For further information see the Software/Physics page. The programs gv and xmgrace are now available on the Monolith frontend login-1.
QuestionI used to call a script in a job file to run the code and after finishing to copy the output file into the home directory. It works properly if there is enough time to execute the code in the given time limit.However, if the job is killed by the queue system then not just the executing code is killed but the whole script is killed, therefore, the output files vanishes because they were in the volatile /disk/local directory. ReplyTo copy out a file after the job has aborted or terminated you can use the PBS stageout facility. Example:#!/bin/sh #PBS -lwalltime=1 #PBS -lnodes=1:ppn=2 #PBS -W stageout=/disk/local/file1@localhost:/disk/global/x_nscnn/file1 #PBS -W stageout=/disk/local/file2@localhost:/disk/global/x_nscnn/file2 cat >/disk/local/file1 <<EOF This is file one EOF cat >/disk/local/file2 <<EOF This is file two EOF sleep 10000In this example /disk/local/{file1,file2} will be copied to /disk/global/x_nscnn/ when the job is finished (or aborted because of time limit). There is a corresponding stagein facility. There are more information in the man-page for qsub on Monolith.
Totalview can be used to debug "live" programs as well as postmortem debug on core files: totalview [ filename [ corefile ]] [ options ]where "filename" specifies the name of an executable to be debugged and "corefile" specifies the name of a core file. The executable must be compiled with source line information (usually the -g compiler switch) in order to give full debug capabilities. On Monolith please note the following:
You can obtain info on how much time you have used the present month with the command projinfo. You run it on your service node on Monolith, normally the computer monolith.nsc.liu.se and it tells you how many CPU hours you have been accounted for during the current month. An example: [y_user@login-1 y_user]$ projinfo Project Used[h] Allocated[h] User ----------------------------------------- p2004099 2313.2 5000 y_user 2313.2This output means that the batch system until now has accounted your project 1590 CPU hours this month, one hundred of these hours for the jobs of x_user and 1490 CPU hours for the jobs of y_user. The accounting is accumulated at the end of each batch job, according to how many computing nodes you have asked for (please remember that each node has two CPUs, meaning that you are accounted for two CPUs for each node) and how long time the job actually run. An example on this, assuming the following definitions in your job script: #PBS -l nodes=4:ppn=2 #PBS -l walltime=10:00:00If we also assume that the job completes after three of the requested ten hours, you are accounted for 24 cpu hours (4 * 2 * 3). If would be the same, if you requested nodes=4:ppn=1, because the system makes reservations only down to the granularity of nodes, not CPU:s. (This may change later.) And it does not matter if you really run your job on all those nodes and all those processors, you are still accounted for them all. As you see from the example, you are not accounted for making long reservations, only for how long time your job really runs. The down side with long reservations is that longer jobs get a lower priority in job scheduling, so you might wait longer for the job to start. The maximum wall clock run time for a job is 144 hours.
The VAMPIR package has two parts:
The full user documentation can be found at: /usr/local/tools/vampir/3.0/doc/
Vampir-userguide.pdf for Vampir or at /usr/local/tools/vampir/4.0/lam/doc/ or at /usr/local/tools/vampir/4.0/mpich/doc/. There are also man pages for Vampir and the Vampir-trace library routines. Follow these steps to start using Vampir: A note for existing Vampir users. The release 3.0 of Vampir has many new features, checkout http://www.pallas.com/e/products/vampir/index.htm for a short description. If you for any reason wish to use Vampir 2.5 you can do so by doing: module unload vampir; module load vampir/2.5 4.0 has some new features:
Default for 4.0 is to have MPICH/ScaMPI applications as target. For LAM, do: module unload vampir/4.0.mpich ; module load vampir/4.0.lamIf you want to use Vampir/Vampirtrace 3.0: module unload vampir/4.0.mpich ; module load vampir/3.0
|