QuantumEspressoをUbuntuにインストール

VASPを使っていたけど、お金かかるからフリーのQuantumEspresso使うことを検討する。

The ultimate integrated suite of Open-Source computer codes for electronic-structure calculations and materials modeling at the nanoscale.

Contents

1 GNUコンパイラでビルド
2 Intelコンパイラでビルド
3 CUDA
4 MPI

GNUコンパイラでビルド

QuantomEspressをダウンロード

QuantomEspressのHPからダウンロードする。

Quantum Espresso

download - Quantum Espresso

https://www.quantum-espresso.org/download-page/

To download the files you must be registered. If you are a registered user you can go directly to “download”, otherwise please “register”. In order to download our free software and documentation, we kindly ask you to identify yourself through a nickname that you can obtain through a simple, non-intrusive, and strictly anonymous registration procedure. Once registered, you will not be asked any data for a second time.

ダウンロードしたら解凍

tar -xzvf qe-7.1-ReleasePack.tar.gz

QuantomEspressをビルド

cmakeでビルドする。

GitLab

CMake build system · Wiki · QEF - Quantum ESPRESSO Foundation / q-e · GitLab

https://gitlab.com/QEF/q-e/-/wikis/Developers/CMake-build-system

GitLab.com

cd qe-7.1
mkdir build; cd build
cmake -DQE_ENABLE_MPI=OFF -DCMAKE_C_COMPILER=gcc -DCMAKE_Fortran_COMPILER=gfortran -DQE_FFTW_VENDOR=Internal ../
make
mv bin/* ~/.local/bin/

-DQE_FFTW_VENDOR=Internalつけないと怒られた。MPIはとりあえずいらない。

実行・結果

下記にあるFeの計算をやってみる。

www.cmpt.phys.tohoku.ac.jp

8. 各軌道、スピンの部分状態密度 — qe-tutorial 3.1.0 ドキュメント

http://www.cmpt.phys.tohoku.ac.jp/~koretsune/SATL_qe_tutorial/projection.html#fe

pw.x < Fe.scf.in > gnu.out

無事計算できたようだ。

General routines
calbec       :      0.29s CPU      0.30s WALL (   16876 calls)
fft          :      0.14s CPU      0.14s WALL (     338 calls)
ffts         :      0.00s CPU      0.00s WALL (      46 calls)
fftw         :     16.34s CPU     16.58s WALL (  243244 calls)
interpolate  :      0.01s CPU      0.01s WALL (      24 calls)

PWSCF        :     27.30s CPU     28.09s WALL

FFTW3

最初、QE_FFTW_VENDORをつけないとエラーが出たとき、

Failed to find an external FFTW library. Alternatively, '-DQE_FFTW_VENDOR=Internal' may be used to enable reference FFTW at a performance loss compared to optimized libraries.

というエラーがでて、これによるとQE_FFTW_VENDOR=Internalだと遅いようなので、FFTW3を入れてみる。

wget https://www.fftw.org/fftw-3.3.10.tar.gz 
tar -xzvf fftw-3.3.10.tar.gz 
mkdir build; cd build 
cmake -DENABLE_THREADS=ON -DENABLE_OPENMP=ON ../ 
make 
sudo make install

cmakeのオプションがあってるんだかわからないが、何もつけずにビルドしたら、Quantum Espressoビルドする時に、undefined reference to 'fftw_init_threads'とか怒られたので、つけてみたらうまくいった。

もう一回Buildする。

rm -rf build
mkdir build; cd build
cmake -DQE_ENABLE_MPI=OFF -DCMAKE_C_COMPILER=gcc -DCMAKE_Fortran_COMPILER=gfortran -DQE_FFTW_VENDOR=FFTW3 ../
make
mv bin/* ~/.local/bin/

実行・結果

pw.x < Fe.scf.in > gnu_fftw3.out

General routines
calbec       :      0.29s CPU      0.30s WALL (   16876 calls)
fft          :      0.22s CPU      0.23s WALL (     338 calls)
ffts         :      0.05s CPU      0.05s WALL (      46 calls)
fftw         :     21.54s CPU     21.78s WALL (  243244 calls)
interpolate  :      0.06s CPU      0.06s WALL (      24 calls)

PWSCF : 32.80s CPU 33.49s WALL

なんか遅くなってる。。。どうもFFTW3が遅い。

Intelコンパイラでビルド

インテルコンパイラは今や無料でインストールできるので、Intelコンパイラで早くならないか試してみる。CPUはcore i7なのでインテルコンパイラの恩恵は十分受けられるはず。インテルコンパイラのインストールは下記参考。

IntelコンパイラーでLammpsは早くなるのか？

FFTW3をビルド

rm -rf build
mkdir build; cd build 
cmake -DCMAKE_C_COMPILER=icc -DCMAKE_CXX_COMPILER=icpc -DCMAKE_Fortran_COMPILER=ifort -DENABLE_THREADS=ON -DENABLE_OPENMP=ON ../
make 
sudo make install

QE_FFTW_VENDOR=Intel_DFTIとか、QE_FFTW_VENDOR=Intel_FFTW3とか色々試してみたが、どれもうまく行かなかった。

QuantomEspressをビルド

rm -rf build
mkdir build; cd build
cmake -DQE_ENABLE_MPI=OFF -DCMAKE_C_COMPILER=icc -DCMAKE_Fortran_COMPILER=ifort -DQE_FFTW_VENDOR=FFTW3 ../
make
mv bin/* ~/.local/bin/

実行・結果

pw.x < Fe.scf.in > intel.out

General routines
calbec       :      0.30s CPU      0.30s WALL (   16876 calls)
fft          :      0.22s CPU      0.23s WALL (     338 calls)
ffts         :      0.04s CPU      0.05s WALL (      46 calls)
fftw         :     19.84s CPU     20.02s WALL (  243244 calls)
interpolate  :      0.05s CPU      0.06s WALL (      24 calls)


PWSCF        :     30.03s CPU     30.63s WALL

インテル遅くね？？

系が大きくなれば早くなるのだろうか？

CUDA

CUDAを使ってみたい。

nvidia hpcインストール

まずは、nvidia hpcをインストール。下記のサイトの通りにやる。

hpcworld.jp

NVIDIA HPC SDKのインストール | HPC WORLD

https://hpcworld.jp/nvidia-hpc-sdk-install/

wget https://developer.download.nvidia.com/hpc-sdk/22.5/nvhpc_2022_225_Linux_x86_64_cuda_11.7.tar.gz
tar xpzf nvhpc_2022_225_Linux_x86_64_cuda_11.7.tar.gz
nvhpc_2022_225_Linux_x86_64_cuda_11.7/install

QuantomEspressをビルド

cmake -DCMAKE_C_COMPILER=nvc -DCMAKE_Fortran_COMPILER=nvfortran -DQE_FFTW_VENDOR=Internal -DQE_ENABLE_CUDA=ON ../

実行・結果

pw.x < Fe.scf.in > cuda.out
Segmentation fault (core dumped)

と出て、そもそも実行できないorz。

cuFFTとか色々使って一日格闘してみたがうまく行かなかったのでCUDA計画は中止。

MPI

mpiでスピードアップをねらう。

cmake -DCMAKE_C_COMPILER=mpicc -DCMAKE_Fortran_COMPILER=mpif90 -DQE_FFTW_VENDOR=FFTW3 ../
make

FFTW3もMPIでビルドしないとだめ？

実行・結果

mpirun -np 4 pw.x < Fe.scf.in > mpi.out
Authorization required, but no authorization protocol specified
Authorization required, but no authorization protocol specified
Note: The following floating-point exceptions are signalling: IEEE_DENORMAL
Note: The following floating-point exceptions are signalling: IEEE_DENORMAL
Note: The following floating-point exceptions are signalling: IEEE_DENORMAL

実行時になんかメッセージが出るんだが？

General routines
calbec       :      0.47s CPU      0.48s WALL (   16918 calls)
fft          :      0.17s CPU      0.17s WALL (     338 calls)
ffts         :      0.03s CPU      0.03s WALL (      46 calls)
fftw         :     14.02s CPU     14.28s WALL (  243116 calls)
interpolate  :      0.04s CPU      0.04s WALL (      24 calls)

Parallel routines

PWSCF        :     20.37s CPU     20.85s WALL

ちょっとは早くなってる。

QE_FFTW_VENDOR=Internal

QE_FFTW_VENDOR=Internalにしたらどうなるだろうか？

cmake -DCMAKE_C_COMPILER=mpicc -DCMAKE_Fortran_COMPILER=mpif90 -DQE_FFTW_VENDOR=Internal ../
make

実行・結果

mpirun -np 4 pw.x < Fe.scf.in > mpi_internal.out
Authorization required, but no authorization protocol specified
Authorization required, but no authorization protocol specified
Note: The following floating-point exceptions are signalling: IEEE_DENORMAL
Note: The following floating-point exceptions are signalling: IEEE_DENORMAL

相変わらずなんかのメッセージが表示される。

General routines
 albec       :      0.42s CPU      0.44s WALL (   16918 calls)
fft          :      0.12s CPU      0.12s WALL (     338 calls)
ffts         :      0.00s CPU      0.00s WALL (      46 calls)
fftw         :     10.44s CPU     10.77s WALL (  243116 calls)
interpolate  :      0.01s CPU      0.01s WALL (      24 calls)

Parallel routines

PWSCF        :     16.58s CPU     17.20s WALL

Internal FFTWが一番早い。。。

ってことは、gnuコンパイラ+Internal FFTWが最速なのか？

gnuコンパイラ

gnuコンパイラでインストール。ちなみにEnvironmental moduleでintelのオン/オフしている。

cmake -DCMAKE_C_COMPILER=mpicc -DCMAKE_Fortran_COMPILER=mpif90 -DQE_FFTW_VENDOR=Internal ../
make

実行・結果

mpirun -np 4 pw.x < Fe.scf.in > mpi_gnu_internal.out

General routines
calbec       :      0.25s CPU      0.27s WALL (   16918 calls)
fft          :      0.09s CPU      0.09s WALL (     338 calls)
ffts         :      0.00s CPU      0.00s WALL (      46 calls)
fftw         :      9.78s CPU     10.16s WALL (  243116 calls)
interpolate  :      0.01s CPU      0.01s WALL (      24 calls)

Parallel routines

PWSCF        :     16.79s CPU     17.57s WALL

差はなかった。

GNUコンパイラでビルド

QuantomEspressをダウンロード

QuantomEspressをビルド

実行・結果

FFTW3

実行・結果

Intelコンパイラでビルド

FFTW3をビルド

QuantomEspressをビルド

実行・結果

CUDA

nvidia hpcインストール

QuantomEspressをビルド

実行・結果

MPI

実行・結果

QE_FFTW_VENDOR=Internal

実行・結果

gnuコンパイラ

実行・結果

コメントする コメントをキャンセル

コメントするコメントをキャンセル