4.1 Execution time

Next: 4.2 Memory requirements Up: 4 Performances Previous: 4 Performances Contents

4.1 Execution time

The following is a rough estimate of the complexity of a plain scf calculation with pw.x, for NCPP. USPP and PAW give raise additional terms to be calculated, that may add from a few percent up to 30-40% to execution time. For phonon calculations, each of the 3N_at modes requires a time of the same order of magnitude of self-consistent calculation in the same system (possibly times a small multiple). For cp.x, each time step takes something in the order of T_h + T_orth + T_sub defined below.

The time required for the self-consistent solution at fixed ionic positions, T_scf , is:

T_scf = N_iterT_iter + T_init

where N_iter = number of self-consistency iterations (niter), T_iter = time for a single iteration, T_init = initialization time (usually much smaller than the first term).

The time required for a single self-consistency iteration T_iter is:

T_iter = N_kT_diag + T_rho + T_scf

where N_k = number of k-points, T_diag = time per Hamiltonian iterative diagonalization, T_rho = time for charge density calculation, T_scf = time for Hartree and XC potential calculation.

The time for a Hamiltonian iterative diagonalization T_diag is:

T_diag = N_hT_h + T_orth + T_sub

where N_h = number of Hψ products needed by iterative diagonalization, T_h = time per Hψ product, T_orth = CPU time for orthonormalization, T_sub = CPU time for subspace diagonalization.

The time T_h required for a Hψ product is

T_h = a₁MN + a₂MN₁N₂N₃log(N₁N₂N₃) + a₃MPN.

The first term comes from the kinetic term and is usually much smaller than the others. The second and third terms come respectively from local and nonlocal potential. a₁, a₂, a₃ are prefactors (i.e. small numbers $\cal {O}$ (1)), M = number of valence bands (nbnd), N = number of PW (basis set dimension: npw), N₁, N₂, N₃ = dimensions of the FFT grid for wavefunctions (nr1s, nr2s, nr3s; N₁N₂N₃∼8N ), P = number of pseudopotential projectors, summed on all atoms, on all values of the angular momentum l, and m = 1,..., 2l + 1.

The time T_orth required by orthonormalization is

T_orth = b₁NM_x²

and the time T_sub required by subspace diagonalization is

T_sub = b₂M_x³

where b₁ and b₂ are prefactors, M_x = number of trial wavefunctions (this will vary between M and 2÷4M, depending on the algorithm).

The time T_rho for the calculation of charge density from wavefunctions is

T_rho = c₁MN_r1N_r2N_r3log(N_r1N_r2N_r3) + c₂MN_r1N_r2N_r3 + T_us

where c₁, c₂, c₃ are prefactors, N_r1, N_r2, N_r3 = dimensions of the FFT grid for charge density (nr1, nr2, nr3; N_r1N_r2N_r3∼8N_g, where N_g = number of G-vectors for the charge density, ngm), and T_us = time required by PAW/USPPs contribution (if any). Note that for NCPPs the FFT grids for charge and wavefunctions are the same.

The time T_scf for calculation of potential from charge density is

T_scf = d₂N_r1N_r2N_r3 + d₃N_r1N_r2N_r3log(N_r1N_r2N_r3)

where d₁, d₂ are prefactors.

For hybrid DFTs, the dominant term is by far the calculation of the nonlocal (V_xψ) product, taking as much as

T_exx = eN_kN_qM²N₁N₂N₃log(N₁N₂N₃)

where N_q is the number of points in the k + q grid, determined by options nqx1,nqx2,nqx3, e is a prefactor.

The above estimates are for serial execution. In parallel execution, each contribution may scale in a different manner with the number of processors (see below).

Next: 4.2 Memory requirements Up: 4 Performances Previous: 4 Performances Contents