next up previous contents
Next: About this document ... Up: 3 Parallelism Previous: 3.3 Parallelization levels   Contents

Subsections


3.4 Tricks and problems

Many problems in parallel execution derive from the mixup of different MPI libraries and runtime environments. There are two major MPI implementations, OpenMPI and MPICH, coming in various versions, not necessarily compatible; plus vendor-specific implementations (e.g. Intel MPI). A parallel machine may have multiple parallel compilers (typically, mpif90 scripts calling different serial compilers), multiple MPI libraries, multiple launchers for parallel codes (different versions of mpirun and/or mpiexec). You have to figure out the proper combination of all of the above, which may require using command module or manually setting environment variables and execution paths. What exactly has to be done depends upon the configuration of your machine. You should inquire with your system administrator or user support (if available; if not, YOU are the system administrator and user support and YOU have to solve your problems).

Always verify if your executable is actually compiled for parallel execution or not: it is declared in the first lines of output. Running several instances of a serial code with mpirun or mpiexec produces strange crashes.

3.4.0.1 Trouble with input files

Some implementations of the MPI library have problems with input redirection in parallel. This typically shows up under the form of mysterious errors when reading data. If this happens, use the option -i (or -in, -inp, -input), followed by the input file name. Example:
   pw.x -i inputfile -nk 4 > outputfile
Of course the input file must be accessible by the processor that must read it (only one processor reads the input file and subsequently broadcasts its contents to all other processors).

Apparently the LSF implementation of MPI libraries manages to ignore or to confuse even the -i/in/inp/input mechanism that is present in all QUANTUM ESPRESSO codes. In this case, use the -i option of mpirun.lsf to provide an input file.

3.4.0.2 Trouble with MKL and MPI parallelization

If you notice very bad parallel performances with MPI and MKL libraries, it is very likely that the OpenMP parallelization performed by the latter is colliding with MPI. Recent versions of MKL enable autoparallelization by default on multicore machines. You must set the environment variable OMP_NUM_THREADS to 1 to disable it. Note that if for some reason the correct setting of variable OMP_NUM_THREADS does not propagate to all processors, you may equally run into trouble. Lorenzo Paulatto (Nov. 2008) suggests to use the -x option to mpirun to propagate OMP_NUM_THREADS to all processors. Axel Kohlmeyer suggests the following (April 2008): "(I've) found that Intel is now turning on multithreading without any warning and that is for example why their FFT seems faster than FFTW. For serial and OpenMP based runs this makes no difference (in fact the multi-threaded FFT helps), but if you run MPI locally, you actually lose performance. Also if you use the 'numactl' tool on linux to bind a job to a specific cpu core, MKL will still try to use all available cores (and slow down badly). The cleanest way of avoiding this mess is to either link with
-lmkl_intel_lp64 -lmkl_sequential -lmkl_core (on 64-bit: x86_64, ia64)
-lmkl_intel -lmkl_sequential -lmkl_core (on 32-bit, i.e. ia32 )
or edit the libmkl_'platform'.a file. I'm using now a file libmkl10.a with:
  GROUP (libmkl_intel_lp64.a libmkl_sequential.a libmkl_core.a)
It works like a charm". UPDATE: Since v.4.2, configure links by default MKL without multithreaded support.

3.4.0.3 Trouble with compilers and MPI libraries

Many users of QUANTUM ESPRESSO, in particular those working on PC clusters, have to rely on themselves (or on less-than-adequate system managers) for the correct configuration of software for parallel execution. Mysterious and irreproducible crashes in parallel execution are sometimes due to bugs in QUANTUM ESPRESSO, but more often than not are a consequence of buggy compilers or of buggy or miscompiled MPI libraries.


next up previous contents
Next: About this document ... Up: 3 Parallelism Previous: 3.3 Parallelization levels   Contents