- 5.1 Why is my job crashing with “segmentation fault”?
- 5.2 The code crashes with a mysterious error in IOTK
- 5.3 The code stops with an error in davcio
- 5.4 Why is the code saying Wrong atomic coordinates?
- 5.5 The code stops with a wrong charge error
- 5.6 The code stops with an error in cdiaghg or rdiaghg
5.1 Why is my job crashing with “segmentation fault”?
Possible reasons: too much memory requested; executable or mathematical libraries compiled for a different hardware; some incompatibility between compiler and mathematical libraries; flaky hardware; bug in compiler or in mathematical libraries. The latter two are typically not reproducible on different architectures or compilers; code bugs may sometimes be elusive, but typically yield a more reproducible pattern of problems. Segmentation faults in tests and examples almost invariably point to a problem in the compiler or in the mathematical libraries or in their interactions.
Mysterious, unpredictable, erratic errors in parallel execution are almost always coming from bugs in the compiler or/and in the MPI libraries and sometimes even from flaky hardware. Sorry, not our fault.
5.2 The code stops with a mysterious error in IOTK
IOTK is a toolkit that reads/writes XML files. There are frequent reports (especially when compling with gfortran and MKL libraries) of mysterious errors with IOTK not finding some variable in the XML data file. If this error has no obvious explanation (e.g. the file is properly written and read, the searched variable is present, etc) and if it appears to be erratic or irreproducible (e.g. it occurs only with version X of compiler Y), it is almost certainly due to a compiler bug. Try to reduce optimization level, or use a different compiler. If you paid real money for your compiler, complain with the vendor.
5.3 The code stops with an error in davcio
davcio is a routine that reads from/writes to disk. The error number is what the I/O operation returns, so it means little more than “there was an error”. Possible reasons: disk is full; outdir is not writable for any reason; you run post-processing codes on a number of processors/pools that are not the same used to produce the pw.x data (and did not set variable wf_collect); you made a mess with your data files and directories; your data files are corrupted; you were running more than one instance of pw.x in the same temporary directory with the same file names.
5.4 Why is the code saying Wrong atomic coordinates?
Because they are: two or more atoms in the list of atoms have overlapping, or anyway too close, positions. Can’t you see why? look better (or use a molecular viewer like XCrySDen) and remember that the code checks periodic images as well.
5.5 The code stops with a wrong charge error
Typically, you are treating a metallic system as if it were insulating. Use a gaussian smearing.
5.6 The code stops with an error in cdiaghg or rdiaghg
This is a tough case. It signals that the Hamiltonian, or the overlap matrix, calculated in the subspace of occupied + correction states (used in iterative diagonalization), is singular. This should however never happen, unless: 1) the atomic positions are seriously wrong (e.g. too close), or 2) the pseudopotentials are bad, or not so good. The latter case typically happens with Ultrasoft PP. When the error is erratic and irreproducible on other machines, it may be related to mathematical libraries of questionable accuracy. If you are out of ideas, try option “diagonalization=’cg’ “.