jacobi_mpi_pr - Performance Report

Command:	mpiexec.hydra -bootstrap slurm /home/cbasu/mpprun_tutorial/050416/jacobi_mpi_pr
Resources:	1 node (16 physical, 16 logical cores per node)
Memory:	31 GB per node
Tasks:	16 processes
Machine:	n919
Start time:	Tue Apr 5 13:54:34 2016
Total time:	55 seconds (1 minute)
Full path:	/home/cbasu/mpprun_tutorial/050416
Input file:
Notes:

Error: javascript is not running

The graphs in this Performance Report require javascript, which is disabled or not working.

Check whether your javascript support is enabled or try another browser.

Remember, you can always contact support@allinea.com, we're very nice!

Summary: jacobi_mpi_pr is Compute-bound in this configuration

Compute

97.9%

Time spent running application code. High values are usually good.

This is very high; check the CPU performance section for advice.

MPI

2.1%

Time spent in MPI calls. High values are usually bad.

This is very low; this code may benefit from a higher process count.

I/O

0.0%

Time spent in filesystem I/O. High values are usually bad.

This is negligible; there's no need to investigate I/O performance.

This application run was Compute-bound. A breakdown of this time and advice for investigating further is in the CPU section below.

As very little time is spent in MPI calls, this code may also benefit from running at larger scales.

CPU

A breakdown of the 97.9% CPU time:

Scalar numeric ops	13.7%
Vector numeric ops	13.5%
Memory accesses	72.8%

The per-core performance is memory-bound. Use a profiler to identify time-consuming loops and check their cache performance.

Little time is spent in vectorized instructions. Check the compiler's vectorization advice to see why key loops could not be vectorized.

MPI

A breakdown of the 2.1% MPI time:

Time in collective calls	65.1%
Time in point-to-point calls	34.9%
Effective process collective rate	1.07e+04
Effective process point-to-point rate	3.9e+08

Most of the time is spent in collective calls with a very low transfer rate. This suggests load imbalance is causing synchronization overhead; use an MPI profiler to investigate.

I/O

A breakdown of the 0.0% I/O time:

Time in reads	0.0%
Time in writes	0.0%
Effective process read rate	0
Effective process write rate	0

No time is spent in I/O operations. There's nothing to optimize here!

Threads

A breakdown of how multiple threads were used:

Computation	0.0%
Synchronization	0.0%
Physical core utilization	100.0%
System load	101.1%

No measurable time is spent in multithreaded code.

Memory

Per-process memory usage may also affect scaling:

Mean process memory usage	1.07e+08
Peak process memory usage	1.07e+08
Peak node memory usage	10.0%

The peak node memory usage is very low. Running with fewer MPI processes and more data on each process may be more efficient.