admin管理员组

文章数量:1566358

2024年3月12日发(作者:)

vasp运行效率

vasp运行时间主要消耗在对角化上。运行时间正比于NBANDS * NPLNW

2

,前者是能带数目,后者是平面波数目;也正

比于NELECT

3

,电子数目三次方。由于NPLNW ∝ ENCUT

3/2

,故运行时间正比于ENCUT

3

。IALGO, 选择对角化算法。对于

小体系,用IALGO=38(Davidson algorithm) ;对于大体系用IALGO=48(RMM-DIIS)。可以设置ALGO=very fast or

fast. RMM-DIIS并行效率比Davidson algorithm高一些。

NPAR,如果IALGO=38,则NPAR取1。对IALGO=48,影响不是特别大,可选2或4,可选节点数,取值越大,内存占

用越多。并行效率总是低于线性叠加效率的,核越多并行效率越低。所以对于一定核(如20),一定作业(如2个),同时算(每

个作业10个核)比先后算(每个作业20个核)要更节约时间。LREAL 控制赝势能量的非局域部分如何计算(k空间或实空间)。

对k-空间,计算量正比于平面波数目(∝ENCUT

3/2

*a

1

*a

2

*a

3

)。实空间计算依赖体系大小。小于25个原子,可用K空间。对大体

系用 LREAL = Auto or LREAL = .Ture. 。设置KPAR为计算节点数或K点数。KPAR用来设置K点的计算并行度。每个K点

用N/KPAR个核来计算,N为总核数。核数很多时(>100),这个参数的影响比较大。

vasp大体系计算的参数设置

1. 减小收敛精度的一些参数设置

k点密度 减少K点

改变迭代算法(ALGO) 大体系,一般ALGO=Very_Fast, IALGO=48

提高高斯展宽(SIGMA增加) SIGMA默认0.2

设置自洽延迟(NELMDL) 在一开始计算时初始化的过程中电子非自洽迭代的步数

截断能ENCUT 确定平面波的切断动能。

PREC 确定计算的精度,它决定了ENCUT和ROPT的默认值。默认设为Medium中等的,VASP4.5以后的版本

可设置的值为Normal普通和Accurate精确

2.

3.

4.

有两个提高并行效率的参数NPAR和KPAR

大体系一般不进行收敛测试了,主要根据小体系的测试值和别人文章的使值来选择比较合适的参数

分步优化:先采用低精度进行优化,比如增大离子步长,降低收敛精度,等收敛之后再提高精度进行精优化,这样相对比

较可能会快一些

vasp的并行计算

vasp的并行分为两个方面:一是对平面波系数的并行,一是对能带的并行。对两者都支持的算法是RMM—DIIS迭代矩阵对角

化(即IALGO=48);而共轭梯度band by band 法(IALGO=8)只支持对平面波系数的并行。开启对能带并行的命令是NPAR。

NPAR 的默认值是1,意思是只对平面波系数并行,所有的节点都计算同一个能带。vasp手册中建议这种情况适合于节点数较

少的集群。如果设置NPAR的值为总节点数,那么每个节点将会独自计算一个能带。vasp手册中指出,这种情况适合于通信带

宽较小的集群,同时,将会增加内存需求。如果NPAR的值位于1到总节点数之间,此时,计算一个能带所需的节点数为:总

节点数/NPAR。另一个常设的并行参数是,LPLANE。经常这样取值:LPLANE=.TRUE.。这样取值,会减少在快速傅里叶变换

时的通信带宽,但同时可能会破坏下载平衡。但这只针对于那些超大规模的集群而言,如上海超算中心那样。

VASP手册相关内容:

Parallelisation: NPAR, NCORE, LPLANE, and the KPAR-tag

并行计算:NPAR, NCORE, LPLANE, 和 KPAR 标签

VASP currently offers parallelisation (and data distribution分布) over bands, parallelization (and data distribution) over plane wave

coefficients系数 (see also Section 4), and as of VASP.5.3.2, parallelization over k-points (no data distribution). To obtain high efficiency

on massively parallel systems大规模并行系统 or modern multi-core machines现代多核机器, it is strongly recommended推荐 to use

all at the same time. Most algorithms算法 work with any data distribution数据分布 (except for the single band conjugated gradient共

轭梯度, which is considered to be obsolete过时的)。

VASP目前提供包含能带的并行计算(和数据分布),包含平面波系数的并行计算(和数据分布),在VASP 5.3.2版本中,

还提供了K点的并行计算(没有数据分布)。为了在大规模并行系统或者现在的多核机器上获得高的运算效率,强烈地推荐在

任何时候都使用并行计算。对绝大多数具有任何数据分布算法的工作都是适用的。(除了共轭梯度算法算单个能带的情况,被

认为是过时的。)

NCORE is available可用的 from VASP.5.2.13 on, and is more handy than方便得多 the previous之前的 parameter NPAR. The user

should either可以 specify指定 NCORE or NPAR, where NPAR takes a higher preference更高的优先级. The relation between both

parameters is

NCORE=total number cores/NPAR

NCORE determines how many cores work on one orbital轨道. The value is also printed输出 at the beginning of the OUTCAR file. The

current当前 default默认 is NCORE=1, implying表示、意味着 that one orbital is treated by one core. NPAR is then set to the total

number of cores. If NCORE equals the total number of cores, NPAR is set to 1. This implies distribution分布 over plane wave

coefficients系数 only: all cores will work on every individual个别的 band, by distributing分布 the plane wave coefficients over all

cores. This is usually very slow and should be avoided.

在VASP.5.2.13中,NCORE是可以用的,它比之前的参数NPAR用起来要方便得多。用户可以指定NCORE或者NPAR,其中

NPAR拥有更高的优先级。这两个参数之间的关系是 NCORE=体系总的原子数/NPAR

NCORE决定了每一条轨道用多少个核计算。这个值也会保存在OUTCAR文件的前面。之前这个值默认为NCORE=1,意味着

一条轨道由一个核来处理。NPAR的值设为总核数。如果,NCORE等于总核数,NPAR设为1。这就仅仅意味着平面波系数的

分布:所有的核都用于处理每一个单独的能带,平面波系数分布在所有的核。这样做,通常非常慢,所以应该避免。

NCORE=1 is the optimal最佳的、最理想的 setting for platforms平台 with a small communication通讯、交流 bandwidth带宽 and

is a good choice for up to cores, as well as和…一样, machines with a single core per node节点 and a Gigabit千兆比特 network.

However, this mode模式 substantially实质上 increases the memory requirements增加了对存储的需求, because the non-local

projector functions非局域投影波函数 must be stored存储 entirely完全地 on each core. In addition此外, substantial实质的,大量

的 all-to-all communications所有的通讯 are required to orthogonalize使正交化 the bands. On massively parallel systems大规模并行

系统 and modern multi-core machines we strongly urge to set强烈建议设置成

核数的开方

or

NPAR=number of cores per compute node 每个计算节点的核数

In selected cases, we found that this improves改善 the performance性能 by a factor of up to four 四倍compared to the default, and it

also significantly显著地 improves the stability稳定性 of the code due to reduced memory requirements减少内存需求.

NCORE=1是窄通讯带宽平台的最佳选择,并且对于多核是一个很好的选择,和机器在每个节点单个核和千兆比特的网络一样。

然而,这种模式实质上会增加对存储能力的要求,因为非局域投影波函数必须完全地存储在每个核上。此外,实质上所有的通

讯都需要使能带正交化。对于大规模并行系统和现代多核机器,我们强烈建议将NPAR设置成核数的开方或者每个节点的核数。

在选择的情形中,我们发现参数这样设置与默认值相比可以改善性能4倍以上,并且可以显著地提升节点的稳定性,因为减少

了对内存的需求。

The second switch开关 influences the data distribution is LPLANE. If LPLANE is set to .TRUE. in the INCAR file, the data distribution

in real space is done plane wise明智的. Any combination组合 of NPAR and LPLANE can be used. Generally通常, LPLANE=.TRUE.

reduces the communication band width通信带宽 during the FFT's, but at the same time it unfortunately不幸地 worsens恶化 the load

balancing平衡 on massively parallel machines大规模并行机器. LPLANE=.TRUE. should only be used if NGZ is at least至少

3*(number of nodes)/NPAR, and optimal优化 load balancing is achieved if NGZ=n*NPAR, where n is an arbitrary integer任意整数. If

LPLANE=.TRUE. and if the real space projector functions实空间投影函数 (LREAL=.TRUE. or ON or AUTO) are used, it might be

necessary to check the lines following

real space projector functions 实空间投影函数

total allocation : 总量分布

max/ min on nodes : 最大/最小核数

The max/ min values should not differ相差 too much, otherwise否则 the load balancing负载平衡 might worsen as well将会被破坏.

另一个影响数据分布的参数是LPLANE。如果INCAR中的LPLANE设置为.TRUE.,实空间的数据分布是比较好的。NPAR和

LPLANE的任意组合都是可以用的。通常,LPLANE=.TRUE.,会减少FFT’s之间的通讯带宽,但是与此同时,将会不幸地导致

大规模并行机器的负载平衡的破坏。LPLANE=.TRUE.只有在NGZ至少是3*(核数)/NPAR的时候才能用,NGZ=n*NPAR才会实

现优化负载平衡,其中n是任意整数。如果LPLANE=.TRUE.和实空间投影函数是否应用(LREAL=.TRUE. 或ON 或 AUTO),

有必要去检查以下行:

real space projector functions 实空间投影函数

total allocation : 总量分布

max/ min on nodes : 最大/最小核数

最大/最小值不能相差太多,否则,负载平衡 就会被破坏。

The optimum最佳 settings for NPAR and LPLANE depend依赖 very much on the type of machine you are using. Results for some

selected machines can be found in Sec. 3.10. Recommended被推荐 setups:

LINUX cluster集群 linked by Infiniband无线带宽技术, modern multicore machines:

On a LINUX cluster with multicore machines linked by a fast network高速网络 we recommend to set

LPLANE = .TRUE.

NCORE = number of cores per nodes (e.g. 4 or 8)

LSCALU = .FALSE.

NSIM = 4

If very many nodes are used, it might be necessary to set LPLANE = .FALSE., but usually this offers very little advantage没

有优势. For long (e.g. molecular dynamics分子动力学 runs), we recommend to optimize NPAR by trying short runs for

different settings.

NPAR和LPLANE的最佳设置非常依赖于你使用机器的类型。一些机器推荐设置:无线宽带技术连接LINUX集群,现代多核机

器:在一个高速网络连接的包含多核机器的LINUX集群,我们推荐设置:

LPLANE = .TRUE.

NCORE = number of cores per nodes (e.g. 4 or 8) 每个节点的核数

LSCALU = .FALSE.

NSIM = 4

如果用了许多核,有必要设置LPLANE = .FALSE.,但是通常这样设置没有优势。对于长时间的计算(比如分子动力学运算),

我们推荐优化NPAR,通过测试不同的值来缩短运算时间。

LINUX cluster linked by 1 Gbit Ethernet以太网, and LINUX clusters with single cores:

On a LINUX cluster linked by a relatively slow network, LPLANE must be set to .TRUE., and the NPAR flag should be equal to

the number of cores:

LPLANE = .TRUE.

NCORE = 1

LSCALU = .FALSE.

NSIM = 4

LINUX集群用一个1G的以太网连接,LINUX集群有一个单核:一个LINUX集群用一个相对较低的网络连接,LPLANE 必

须设置成 .TRUE.

Mind that you need at least a 100 Mbit full duplex二倍的、双重的 network全双工网络, with a fast switch快速开关 offering at

least 2 Gbit switch capacity能力、容量 to find usefull有用 speedups加速效果. Multi-core machines should be always linked by an

Infiniband, since Gbit is too slow for multi-core machines.

Massively parallel machines (Cray, Blue Gene): 大规模并行机器

On many massively parallel machines大规模并且机器 one is forced to被迫 use a huge number of nodes. In this case load

balancing problems负载平衡问题 and problems with the communication bandwidth通信带宽问题 are likely to be experienced. In

addition另外、此外 the local memory局部存储器 is fairly small相当小 on some massively parallel machines大规模并行器; too

small keep the real space projectors时空间投影 in the cache在缓存中 with any setting. Therefore, we recommend to set NPAR on

these machines to (explicit明确地 timing定时、时间安排 can be helpful to有用的 find the

optimum value最优值). The use of LPLANE=.TRUE. is only recommend if the number of nodes is significantly smaller than NGX,

NGY and NGZ.

In summary, the following setting is recommended

LPLANE = .FALSE.

NPAR = sqrt(number of nodes)

NSIM = 1

KPAR is the number of k-points that are to be treated in parallel并行处理 (available as of VASP.5.3.2). The set of k-points is

distributed分布 over KPAR groups of compute cores计算核心, in a round-robin循环 fashion. This means that a number of

compute cores together work on an individual个体的、独特的 k-point (choose KPAR such that it is an

integer divisor整数因子 of the total number of cores总核数). Within this group of N cores that share the work on an individual k-point,

the usual parallelism并行 over bands and/or plane wave coefficients applies (see above).

Note: the data is not distributed分布 additionally此外 over k-points.

本文标签: 计算并行设置能带节点