next up previous contents
Next: 4.6 Restarting Up: 4 Performances Previous: 4.4 Parallelization issues   Contents

Subsections

4.5 Understanding the time report

The time report printed at the end of a pw.x run contains a lot of useful information that can be used to understand bottlenecks and improve performances.

4.5.1 Serial execution

The following applies to calculations taking a sizable amount of time (at least minutes): for short calculations (seconds), the time spent in the various initializations dominates. Any discrepancy with the following picture signals some anomaly.

For PAW and Ultrasoft PP, you will see a larger contribution by "sum_band" and a nonnegligible "newd" contribution to the time spent in "electrons", but the overall picture is unchanged. You may drastically reduce the overhead of Ultrasoft PPs by using input option "tqr=.true.".

4.5.2 Parallel execution

The various parallelization levels should be used wisely in order to achieve good results. Let us summarize the effects of them on CPU:

and on RAM: In an ideally parallelized run, you should observe the following:

4.5.2.1 Quick estimate of parallelization parameters

You need to know

These data allow to set bounds on parallelization: You will need to experiment a bit to find the best compromise. In order to have good load balancing among MPI processes, the number of k-point pools should be an integer divisor of Nk; the number of processors for FFT parallelization should be an integer divisor of N3.

4.5.2.2 Automatic guess of parallelization parameters

Since v.7.1, the code tries to guess a reasonable set of parameters for the k-point, linear-algebra, and task-group parallelizations, if they are not explicitly provided in the command line. The logic is as follows:

4.5.2.3 Typical symptoms of bad/inadequate parallelization


next up previous contents
Next: 4.6 Restarting Up: 4 Performances Previous: 4.4 Parallelization issues   Contents