Synlett 2023; 34(10): 1135-1146
DOI: 10.1055/s-0042-1753141
cluster
Dispersion Effects

Efficient Computation of the Interaction Energies of Very Large Non-covalently Bound Complexes

,
,
,
 


Abstract

We present a new benchmark set consisting of 16 large non-covalently bound systems (LNCI16) ranging from 380 up to 1988 atoms and featuring diverse interaction motives. Gas-phase interaction energies are calculated with various composite DFT, semi-empirical quantum mechanical (SQM), and force field (FF) methods and are evaluated using accurate DFT reference values. Of the employed QM methods, PBEh-3c proves to be the most robust for large systems with a relative mean absolute deviation (relMAD) of 8.5% with respect to the reference interaction energies. r2SCAN-3c yields an even smaller relMAD, at least for the subset of complexes for which the calculation could be converged, but is less robust for systems with smaller HOMO–LUMO gaps. The inclusion of Fock-exchange is therefore important for the description of very large non-covalent interaction (NCI) complexes in the gas phase. GFN2-xTB was found to be the best performer of the SQM methods with an excellent result of only 11.1% deviation. From the assessed force fields, GFN-FF and GAFF achieve the best accuracy. Considering their low computational costs, both can be recommended for routine calculations of very large NCI complexes, with GFN-FF being clearly superior in terms of general applicability. Hence, GFN-FF may be routinely applied in supramolecular synthesis planning.

1 Introduction

2 The LNCI16 Benchmark Set

3 Computational Details

4 Generation of Reference Values

5 Results and Discussion

6 Conclusions


# 1

Introduction

Supramolecular chemistry finds application in many areas of chemistry,[1] such as in drug delivery,[2] [3] the design of artificial molecular motors,[4] and in catalysis.[5,6] The structures and functionalities of these compounds are mainly governed by non-covalent interactions (NCIs). Of the various interaction types observed in supramolecular complexes, such as hydrogen and halogen bonding, π–π-stacking, and ion-dipolar interactions, London dispersion (LD) forces contribute a large amount of the interaction energy in many cases.[7] This comparably weak interaction type is omnipresent in chemistry and biology. Since it depends on the contact surface of the respective complex, LD interactions can equate to sizeable energy contributions for large systems and must not be neglected.[8] [9]

Computational chemistry has become a powerful tool for modeling structures and predicting the binding motifs of NCI complexes.[10] Herein, a major challenge for theoretical methods is the accurate description of LD effects.[11] Computationally efficient approaches, such as the D3[12] [13] and D4[14] dispersion schemes, the exchange-hole dipole moment (XDM)[15,16] approach or the non-local VV10 correction,[17] can be used for the description of this long-range electronic correlation effect. Combined with accurate density functional approximations (DFAs), the challenging task of reliably predicting the interaction energies of large supramolecular complexes consisting of up to 2000 atoms becomes feasible.[18] [19] For the system size given in the LNCI16 set, dispersion contributions beyond the pairwise attributions considered become increasingly important[20] and can be efficiently modeled by the three-body Axilrod–Teller–Muto (ATM)[21] [22] term in D3 or D4, and the many-body dispersion (MBD)[23] [24] correction scheme.[25] For large π systems with small HOMO–LUMO gaps, as is more often the case with the binding of adsorbates on metal surfaces, higher-order dispersion terms can become more important.[26]

Another crucial point for supramolecular complexes is the basis set superposition error (BSSE),[27] which leads to an overestimation of interaction energies if DFAs or wave function theory (WFT) methods are used in combination with smaller basis sets. In most cases, the use of basis sets with at least triple ζ quality is necessary to obtain a diminishing BSSE.[25] This becomes computationally unfeasible for systems with more than 500 atoms. Therefore, several empirical approaches have been developed to reduce the BSSE, such as the geometrical counterpoise correction (gCP)[28] and the related beyond pairwise approach of the DFT-C method.[29]

A technical problem for the DFT calculations of large molecules in the gas phase, i.e., without electrostatic screening via a proper solvent model, is that the HOMO–LUMO gap often diminishes with increasing system sizes, even for mostly chemically saturated proteins.[30] In combination with the notorious underestimation of HOMO–LUMO gaps by common (meta-)GGA DFAs, the gaps may even approach zero, leading to very unstable self-consistent field (SCF) iterations and unreliable results. While hybrid DFAs suffer less from this problem due to mixing in of exact exchange into the functional, their serious drawback is the order of magnitude higher computational cost.

Due to their robustness and low computational costs, efficient semi-empirical quantum mechanical (SQM) methods are powerful tools for the structural and energetic screening of large molecular systems.[31] [32] For example, in recent studies,[33] [34] SQM methods and computationally much cheaper but also much more simplified atomistic force field (FF) methods have been used for modeling even larger supramolecular structures such as those frequently encountered in structural biology.[35]

As the much lower computation time of SQM and even more FF methods comes at the cost of a higher degree of empiricism, it is crucial to carefully benchmark SQM and FF methods against accurate reference values.[36] For this purpose, several important NCI benchmark sets have been composed, such as L7,[18] S30L,[19] and the ‘extra-large’ EXL8,[37] covering systems with up to 1027 atoms. A subset of the latter was used in a recent DFT benchmark study.[38] However, except for EXL8, which has limited statistical validity due to the small number of systems, there are, to our knowledge, no benchmark sets with NCI complexes with significantly more than 250 atoms, although these systems are just as relevant and occur in many areas of chemistry and structural biology.[8] [9] Therefore, we propose the new LNCI16 benchmark set with systems ranging from 380 up to 1988 atoms. By covering diverse interactions such as hydrogen bonding, halogen bonding, π–π-stacking, and with a main focus on London dispersion bound complexes, this set aims to represent the great diversity in supramolecular chemistry. By comparison to accurate DFT reference values, various efficient composite DFT methods as well as low-cost SQM and FF methods are evaluated.

In this work, the interaction energy (E int) is calculated via the supermolecular approach:

E int = E(AB) – E(A) – E(B) (1),

where E is the gas-phase electronic energy of complex AB, host A, and guest B, respectively. This approach is generally applicable for any system size and is only limited by the computational costs of the employed method for the calculation of the complex. Since we aim at a theoretical benchmark, i.e., not comparing with experimentally measured association energies, all energies are calculated in the geometry of the complex and thus neglect the fragment (monomer) relaxation energy. In the gas phase, the NCIs calculated by various methods can unambiguously be compared with each other, which would not be possible in solvated-state calculations for which no common model exists. The calculation of gas-phase energies for charged systems turned out to be problematic in many cases due to the missing screening effects of a solvent, thus leading to almost vanishing HOMO–LUMO gaps (<1 eV). Therefore, many systems, especially those with higher charges as often present in proteins, had to be excluded from the benchmark set.

After a description of the test set in the next section, we summarize the computational details followed by a statistical evaluation of the employed DFT, SQM, and FF methods. Finally, we discuss computational timings and give method recommendations for computation of the interaction energies of large NCI complexes.


# 2

The LNCI16 Benchmark Set

Figure [1] shows the 16 optimized complex structures contained within the benchmark set, details of which are briefly described below.

Zoom Image
Figure 1 Structures and names of the complexes comprising the LNCI16 benchmark set (systems 3–7 from EXL8). Carbon atoms in the host molecules are colored light gray while they are depicted in light blue in the guest molecules.

The first three systems all include boron atoms, with systems 1 and 2 consisting of the same porous organic cage host. The apolar guest benzene is mostly bound via dispersion interactions (BpocBenz). However, the host is also able to form hydrogen bonds with the polar guest methanol, as in BpocMeOH.[39] System 3 is a boron–nitrogen nanotube with an α-ALA guest molecule that is mostly bound by LD interactions.[40] Gramicidin A (4) is an ionophoric antibiotic, which forms helical dimers that are connected by hydrogen bonds.[41] H-bonds are also the main binding motif in system 5 consisting of a linear oligocarbamate guest molecule and a helical host.[42] DNA is arguably the most notable supramolecular complex and is predominantly bound by H-bonds. Hence, a neutralized cutout of this important system is also included in our benchmark set. Next is a cutout of a protein–ligand complex of the Src homology 3 (SH3) protein (7), which is able to recognize specific proline sequences.[43] Systems 37 are part of the EXL8 benchmark set reported by Ni et.al.[37] and were taken in their original geometries.

With the following two protein-ligand cutouts, charged systems were also considered in the test set. The first is the tyrosine-protein kinase 2 (TYK2) with an ejm46 ligand in the binding pocket, which is of interest for the treatment of inflammatory diseases.[44] In the second system, Rivaroxaban, an anticoagulant drug, is bound to the activated serine protease factor X (FXa).[44] Another field in which non-covalent interactions play a major role is the adsorption processes of molecules on surfaces.[45] System 10 shows such adsorption which is dominated by π–π-interactions between a graphene sheet and two dipolar donor–acceptor dye molecules (2xHB238).[46] These dyes belong to a class of polymethine dyes (also called merocyanines). System 11 consists of an interaction between a graphene sheet and a fullerene-based macrocycle co-adsorbate, which was synthesized by the Höger group.[47]

Furthermore, two systems consisting of a halogen-bonded capsule were chosen to further assess this challenging interaction type.[48] [49] In DithBrCap, the interaction energy between two dithiane guest molecules and the capsule is investigated, while for BrCap the interaction through halogen bonding between the two parts of the capsule is computed.[50] System 14 is a model for a molecular muscle and represents an important class of molecular machines. The muscle is given in its ‘contracted’ form and is bound via hydrogen bonds and LD interactions between its two identical parts.[34] Another important class of supramolecular complexes is rotaxanes in which the host and guest molecules are mechanically interlocked. System 15 is a phenylacetylene-based complex belonging to this class with the major type of interaction being LD.[51] The final and largest system of the benchmark set is a snapshot taken from a molecular dynamics simulation of long nylon chains that are intertwined forming a nanoparticle via hydrogen bonding.[52] The behavior of such plastic nanoparticles is of interest in environmental chemistry.[53] Furthermore, the absorption of small molecules in clothing fabrics such as nylon is another field of interest.[54] [55]

An overview of the complexes with their charges and calculated reference interaction energies (see Section 4) is given in Table [1].[56]

Table 1 Overview of the Investigated Complexes, Their Charges and Calculated ωB97X-3c[56] Reference Interaction Energies (kcal mol–1)a

Complex

Charge

Ref.

E int (kcal mol–1)

BpocBenz (1)

 0

39

–  6.81

BpocMeOH (2)

 0

39

–  6.19

BNTube (3)

 0

39, 40

– 14.32

GramA (4)

 0

39, 41

– 36.30

DHComplex (5)

 0

39, 42

– 57.57

DNA (6)

 0

39

–363.30

SH3 (7)

 0

39, 43

– 25.65

TYK2 (8)

+1

44

– 49.03

FXa (9)

–2

44

–105.27

2xHB238 (10)

 0

46

– 74.92

FullGraph (11)

 0

47

– 74.13

DithBrCap (12)

 0

50

– 45.63

BrCap (13)

 0

50

– 21.12

MolMus (14)

 0

34

– 62.58

Rotax (15)

 0

51

– 55.89

Nylon (16)

 0

52, 53

–566.23

a The respective reference from which the structure was taken is given. Additional details on the geometries can be found in the Supporting Information.

The inclusion of boron atoms in systems 13 may be problematic for many force field methods as they are rarely parameterized for each atom type. Nevertheless, these interesting systems were included because they make the benchmark set more diverse. Force fields may also be evaluated using the subset without boron atoms (systems 416), which still includes a reasonable size of 13 systems compared to the L7 or EXL8 benchmark sets.


# 3

Computational Details

All methods evaluated in this work are given in Table [2]. The PM6-D3H4X[57] [58] [59] and PM7[60] calculations were carried out with MOPAC2016.[61] GFN2-xTB,[32] [62] GFN1-xTB,[63] GFN0-xTB[64] and GFN-FF[65] calculations were performed using xTB.[66] The xTB-IFF[67] energies were computed using the xTB-IFF program.[68] D3(BJ)[12] [13] [15] dispersion contributions for the B97M functional with and without the inclusion of the ATM term were conducted with the s-dftd3[69] standalone program. D4[14] [70] dispersion energies, which include the ATM term by default, were calculated using the dft-d4[71] 3.4.0 standalone program. D4-MBD dispersion energies were computed with dft-d4 2.4.0.

The DFTB engine of the Amsterdam Modeling Suite (AMS)[72] was used to perform the DFTB calculations with the Quasinano2015 parametrization.[73] [74] Additionally, the engine was employed for the calculation of GFN1-xTB charges, which were then fed into the AMS ForceField engine[75] for the UFF[76] calculations. The Open Babel program package[77] [78] was used to perform the MMFF94[79] [80] and Ghemical[81] calculations using the respective default charges. However, the default charge model of the Ghemical force field predicted wrong charges for the charged systems 8 and 9 (+2 instead of –2 for FXa and +3 instead of +1 in the case of TYK2). Open Babel was also employed for the GAFF[82] calculations, which were conducted with Gasteiger charges[83] as well as GFN2-xTB charges.

Table 2 Tested Composite QM, SQM, and FF Methods with the Respective Applied Dispersion Correctionsa

Method

Dispersion

Ref.

Composite QM

B97M-V-C

VV10

29, 91

PBEh-3c

D3(BJ)-ATM

88

r2SCAN-3c

D4

85

B97-3c

D3(BJ)-ATM

87

HF-3c

D3(BJ)

86

SQM

DFTB(Quasinano)

D3(BJ)-ATM

73, 74

GFN2-xTB

D4

92

GFN1-xTB

D3(BJ)

63

PM7

D2

60

PM6-D3H4X

D3

57

GFN0-xTB

D4

94

FF

GFN-FF

D4

65

UFF

LJ

76

xTB-IFF

D4

67

GAFF

LJ

82

MMFF94

Buf-14-7

79, 80

Ghemical

LJ

81

a Additional computational details are given in the Supporting Information.

TURBOMOLE (V. 7.5.1)[84] was used for the r2SCAN-3c,[85] HF-3c,[86] B97-3c,[87] and PBEh-3c[88] calculations. The ωB97X-3c[56] reference values were calculated with the ORCA (V. 5.0.2)[89] [90] quantum chemistry package. Due to convergence problems in Q-Chem using the B97M-V[17,91] DFA in combination with the def2-SVPD[92,93] basis set, single-point calculations for this method were performed with TURBOMOLE, while DFT-C[29] correction terms were computed with the Q-Chem program package.[94] The non-local VV10 correction was computed non-self-consistently.


# 4

Generation of Reference Values

The generation of reliable reference values is crucial for every theoretical benchmark study. For the system size covered by the LNCI16 set, this is an especially challenging task. The use of the common ‘gold standard’ (CCSD(T)), even with local approximations, which have been successfully used for NCI complexes of up to about 1000 atoms,[39] [95] is not feasible for the systems comprised in the LNCI16 set because of the enormous computational cost. In this work, a newly developed, efficient, range-separated DFA composite method termed ωB97X-3c is applied to generate reference interaction energies. It employs a deeply contracted valence double-ζ atomic orbital (AO) basis set (vDZVP), which is specially optimized for molecules in combination with large core ECPs and a refitted D4 dispersion correction. Due to its molecular (DFT) optimization and the specially adapted D4 parameterization, this composite method is essentially BSSE free, despite its small basis set. Consequently, interaction energies from ωB97X-3c for existing NCI benchmark sets show very small deviations from the basis-set converged results of the parent method ωB97X-D4[96]/def2-QZVPP[92] with revised D4 parametrization[56] (Table [3]). A detailed description and extensive evaluation of this DFA will be published in the near future.[56] Note that ωB97X-3c, with its specific D4 parametrization[56] in combination with the uniquely developed basis set ensuring a small BSSE, is among the most accurate DFAs ever tested for NCIs, and yet is computationally feasible for NCI complexes with a few thousand atoms. The respective mean absolute deviations (MADs) of the popular B3LYP-D4[97] [98] method are given for comparison and are in most cases significantly larger. Based on this excellent performance for diverse NCI benchmark sets, ωB97X-3c is a suitable reference method, while still being affordable for the computation of systems of up to a few thousand atoms. Due to its range-separated hybrid DFA character, this method does not suffer from the aforementioned gap problem, and we observed no SCF convergence problems, even for small gap test systems with up to 2795 atoms. Since this presented benchmark study mainly focuses on SQM and FF methods, the estimated errors of ωB97X-3c are expected to be much smaller compared to the typical errors of the SQM and FF methods, thus enabling a meaningful evaluation of these low-cost methods.

Table 3 Mean Absolute Deviations (MAD) (kcal mol–1) from Accurate Reference Values (Mostly of CCSD(T)/CBS Quality)a

ωB97X-3c

ωB97X-D4/QZ

B3LYP-D4/QZ

S30L

1.7

1.9

5.3

IONPI19

1.0

1.0

1.2

L7

1.6

0.7

2.4

S66

0.3

0.1

0.3

R160x6

0.3

0.2

0.2

HB300SPX

0.3

0.2

0.5

a MAD values for ωB97X-3c, ωB97X-D4/QZ (with revised D4 parametrization[41]), and the popular B3LYP-D4/QZ method for NCI benchmark sets S30L,[19] IONPI19,[99] L7,[18] [100] S66,[101] R160x6,[102] and HB300SPX.[103] ωB97X-3c and ωB97X-D4 values are taken from Ref. 56; B3LYP-D4 data was taken from Ref. 85 (see the Supporting Information for more detailed information); QZ corresponds to def2-QZVPP.


# 5

Results and Discussion

Considering the broad range of interaction energies covered in the LNCI16 set, the usual statistical error measures, such as mean deviation (MD) and mean absolute deviation (MAD), may be strongly biased by the large interaction energy cases. A downscaling of these very large interaction energies would have a certain arbitrariness, hence we decided to base our statistical evaluation on the relative deviations from the reference method. This was also suggested by Piecuch to enable a meaningful evaluation of the performance of DFT methods for NCIs,[104] considering a relative deviation below 5% as ‘chemical accuracy’ in this context. Additionally, we provide MDs and MADs for all the evaluated methods in the Supporting Information. An overestimation of the interaction energy (‘overbinding’) by a method results in a negative relative deviation, whereas an underestimation (‘underbinding’) is defined as a positive relative deviation. Although systems with a ωB97X-3c HOMO–LUMO gap below 3.5 eV were removed, convergence problems with the tested (meta-)GGA DFT methods r2SCAN-3c, B97-3c, and B97M emerged for systems 911, such that interaction energies could not be calculated with these methods for the corresponding complexes.

First, the contribution of higher-order dispersion terms than the pairwise attributions in the DFT-D framework is discussed for the B3LYP functional. This functional was chosen for the discussion as its D4 dispersion contribution is usually in good agreement with WFT dispersion energy estimates.[104] Figure [2] shows the relative contributions to the D4 energy of the three-body ATM term and of the many-body approach (MBD), including higher-order dispersion terms.[14] Interestingly, the ATM contribution to the interaction energies of systems BpocBenz, BpocMeOH, and BrCap is close to zero, which can be attributed to the small contact surface consisting mainly of pairwise NCI contacts between host and guest in the complex. However, we generally observe a significant contribution of the ATM term of 8.0% on average, which indicates the importance of incorporating many-body dispersion effects by including the three-body ATM term. This is consistent with observations made in previous studies on smaller NCI complexes.[106] Consequently, we expect overbinding of methods which neglect the three-body dispersion. However, the inclusion of higher-order dispersion terms in the many-body D4-MBD approach is on average only 0.5% different to the D4-ATM values with a maximum of 1.4% for BrCap. This demonstrates that the D4 correction sufficiently includes many-body dispersion terms by the inclusion of the ATM term, even for large systems found in the LNCI16 set.

Zoom Image
Figure 2 Relative contributions of the ATM/MBD-terms to the respective overall D4-ATM/MBD dispersion energies. The D4 correction with only pairwise terms is given as a reference (total dispersion energies are calculated as a sum of the pairwise and higher-order contributions). The lines connecting the data points are given for better visibility.

For the system sizes covered in the LNCI16 set, DFT calculations are only feasible with rather small basis sets which, however, are subject to a significant BSSE, especially for NCI complexes. In this respect, the ‘3c’[85] or ‘-C’[29] composite DFT methods are particularly suited as they combine relatively small basis sets with an approximate BSSE correction or by absorbing it in the basis set and a D4 refit (see Section 4). The importance of such correction schemes can be exemplified for the B97M-V functional using the def2-SVPD basis set, for which a relMAD of 58.9% without any BSSE correction is obtained for the LNCI16 set. The relMAD can be clearly reduced to 25.1% upon applying the DFT-C correction (see the Supporting Information for more details). Hence, this correction scheme seems to work effectively, even for the very large NCI complexes evaluated in the present study, which can also be seen in the overall good accuracy of B97M-D3-ATM-C (relMAD: 8.8%).

Figure [3] shows the relative deviations from the reference values for all the tested composite QM methods. Since the SCF iterations of the evaluated (meta-)GGA functionals did not converge for FXa, 2xHB238, and FullGraph, only the remaining 13 systems are included in the respective relMADs of these methods. In contrast, PBEh-3c and HF-3c could be converged for all systems of the LNCI16 set, stressing the importance of Fock exchange for the robust treatment of large NCI complexes in the gas phase.

Zoom Image
Figure 3 Relative deviations (given in %) to ωB97X-3c of the tested composite QM methods. The lines connecting the data points are given for better visibility. * Only systems 18 and 1216, for which the SCF converged, were taken into consideration for determination of the relMADs.

The HF-3c method, however, systematically overestimates the interaction energies of the investigated complexes, which is in line with the reported behavior of HF-3c for the S12L supramolecular NCI benchmark set.[86] With an overestimation of more than 60% for BrCap, HF-3c is not even able to describe this system qualitatively correctly. Therefore, this method can only be recommended to a very limited extent for the calculation of the interaction energies of very large NCI complexes.

B97-3c and B97M-D3-ATM-C yield interaction energies of comparable accuracy but also show a tendency to overestimate the interaction energies, presumably due to some residual BSSE. The smallest relative MAD of 6.6% is obtained by r2SCAN-3c. Thus, the method comes close to the chemical accuracy for NCIs of ca. 5% for the LNCI16 set, which further confirms its generally good performance for non-covalently bound systems.[85] [99] In terms of accuracy, PBEh-3c performs second best among all the tested methods, and more importantly, it converges for all systems of the LNCI16 set. For the subset where the (meta-)GGA DFAs also converged, PBEh-3c achieved a relMAD of 8.9%. Moreover, this composite hybrid DFT method also yields the smallest relative MD of –1.8% (also the smallest relMD for the mentioned subset), therefore systematic errors can be largely excluded. Overall, PBEh-3c provides the best compromise between accuracy and robustness among all the composite QM methods tested and once again underlines the effectiveness of the ‘3c’ approach.

By employing much cruder approximations such as, among others, the use of very small basis sets, SQM methods are even able to compute very large systems with more than a thousand atoms, often with acceptable accuracy.[32] [107] The relative deviations with respect to the LNCI16 reference values for the best-performing SQM methods tested in this work are shown in Figure [4]. The observed tendency of PM6-D3H4X to overestimate the interaction energies (relMD: 16.8%) is in agreement with previous studies on smaller NCI complexes.[86] To a similar extent, GFN1-xTB also shows this tendency (relMD: 16.6%). Closely followed by PM6-D3H4X, DFTB(Quasinano) is the least accurate among the evaluated SQM methods. DFTB(Quasinano) underestimates the interaction energies of most of the hydrogen-bonded systems in the test set (BpocMeOH, GramA, DNA, and Nylon). With a relative MAD of only 11.1%, which is outstanding for an SQM method, GFN2-xTB provides by far the best accuracy within this class of methods. It does not show systematic under- or overbinding. Although this was to some extent expected, since GFN2-xTB was specifically parameterized to accurately describe non-covalent interactions; this also holds true for the diverse and very large NCI complexes of the LNCI16 set. In contrast, GFN1-xTB, although also parameterized with a focus on non-covalent interactions, is clearly outperformed by its successor, suggesting that the parameterization of GFN2-xTB in combination with its modified Hamiltonian also works better for very large NCI complexes, consistent with the performance for the S30L set.[62] A similar overbinding tendency of GFN1-xTB was observed for the ACONFL[108] set, which consists of conformers with long alkane chains, whereas this was not observed for GFN2-xTB. This can be explained by the larger basis set for hydrogen in GFN1-xTB, which leads to an underestimation of the repulsive NCI contacts. In addition, the higher multipole terms in the GFN2-xTB Hamiltonian improve the description of hydrogen bonding.[62]

Zoom Image
Figure 4 Relative deviations (given in %) from the ωB97X-3c reference values presented for the evaluated SQM methods that performed best (the PM7 and GFN0-xTB results can be found in the Supporting Information). The lines connecting the data points are given for better visibility.

Without self-consistent charge iterations, the GFN0-xTB method saves computation time compared to GFN1-/GFN2-xTB (cf. Figure [6]),[64] but at the price of significantly larger errors (relMAD: 26.3%). Contrary to the reasonable accuracy of PM6-D3H4X, PM7 drastically overestimates most interaction energies (relMAD: 70.1%, relMD: –68%). Besides the already known poor performance of PM7 for dispersion bound complexes,[109] it also showed bad results for other non-covalent interaction types (e.g., H bonds) in this study. One reason for a systematic overbinding of the method might be the use of the D2[110] dispersion correction, which generally performs worse than the more sophisticated D3 or D4 methods.[111]

Figure [5] shows the four best-performing force fields assessed for this study. Detailed results for MMFF94 and Ghemical can be found in the Supporting Information and their overall performance is discussed below. GFN1-xTB charges were used for the UFF calculations, while GFN2-xTB charges were employed for the GAFF calculations, which significantly improved the results of both force fields (see the Supporting Information for more details). Notably, the UFF force field yields very inaccurate interaction energies, predicting BpocMeOH, GramA, DNA, and Nylon to be unbound complexes (relative deviations larger than +100%). All these complexes have in common that hydrogen bonding plays a significant role, which indicates that UFF may not be able to describe this type of interaction qualitatively correctly. The xTB-IFF force field generally predicts quite accurate interaction energies, except for TYK2 and FXa, which are the only two charged complexes in the benchmark set. Problems of the xTB-IFF force field in describing charged systems were already reported in the original publication.[67] Excluding these two systems from the statistical evaluation, results in a small relMAD of only 23.0%. In fact, xTB-IFF is by far the most accurate force field for neutral systems of the LNCI16 set among all the tested FF methods. Considering the complete benchmark set, GFN-FF yields the smallest relMAD of 34.7% with a tendency to overbind (relMD: –21.5%). The GAFF force field is only slightly less accurate (relMAD: 37.0%) compared to GFN-FF and shows only a moderate underbinding with a relMD of 11.0%. GAFF does not include parameters for boron and hence, hydrogen parameters are used instead.

Zoom Image
Figure 5 Relative deviations (given in %) from the ωB97X-3c reference values presented for the evaluated FF methods that performed best (MMFF94 and Ghemical results can be found in the Supporting Information). The lines connecting the data points are given for better visibility. a Using GFN2-xTB charges. b Using GFN1-xTB charges.

Surprisingly, this rather crude workaround yields quite accurate results for systems 1 and 2. However, the large deviation for system 3 may be attributed to the missing genuine parametrization for boron. The halogen capsule (BrCap) is predicted to be unbound, verifying the known inaccurate description of halogen bonding with GAFF.[112] The MMFF94 force field is not parameterized for boron and since it was not possible to use standard parameters in Open Babel for these cases, we could not obtain results for the boron-containing complexes 13. Hence, these systems are not considered in the respective relMAD. The force field systematically underbinds every system of the LNCI16 set resulting in a relMAD (and relMD) of 93.0%, and is therefore not recommended for the modeling of large NCI complexes. Due to the incorrectly assigned global charges for systems 8 and 9 by the Ghemical FF, both systems were identified as outliers and are thus excluded from the relMAD. The force field underbinds almost every system (relMD: 112.9%) and shows an extremely poor performance that is comparable to that of UFF (relMAD: 115.8%), also incorrectly predicting systems like the DNA double-helix to be unbound.

Finally, the computational costs of the assessed methods were compared. The wall times were determined by summing up the total wall times needed for the Nylon complex, host, and guest calculations and multiplying them by the number of cores that were used. Please note that the actual wall times were smaller because the calculations were run in parallel on multiple processes. However, since the number of CPU processes used in each case was different for the tested methods, we have normalized the wall times to one process for a comparison that is as unbiased as possible. The wall times obtained in this way for the Nylon interaction energy as well as the respective relMADs for the whole benchmark set of all the methods used are shown in Figure [6]. The methods are divided into three groups, namely composite QM, SQM, and FF.

Zoom Image
Figure 6 Computational wall times (Intel® Xeon® E5-2660 v4 @ 2.00 GHz CPU) for the calculation of the interaction energies of the largest system in the LNCI16 set (Nylon), together with the respective relMADs for the complete benchmark set: (a) only converged systems were taken into account for the relMADs, (b) using GFN1-xTB charges, (c) using GFN2-xTB charges, (d) systems including boron were excluded from the statistical analysis, and (e) charged systems could not be calculated and were not considered in the respective relMAD. (Total wall times needed for the Nylon complex, host, and guest calculations were summed and multiplied by the number of cores used.)

The composite QM methods require weeks of computation time and are probably too expensive to be used routinely for systems of this size. A special case is the HF-3c method, which has a theoretical wall time of roughly one day for the Nylon interaction energy (on one CPU core), but yields even less accurate results than the best-performing SQM methods, which are still more than 90 times faster. Therefore, we do not recommend the use of HF-3c for the description of very large NCI complexes. The best-performing composite QM method is PBEh-3c, providing a good compromise of computation time, accuracy, and robustness. Although the r2-SCAN-3c method required a slightly smaller wall time (10% less than PBEh-3c) and also yielded a smaller relMAD compared to PBEh-3c, it is considerably less robust than the latter, as already discussed above. Hence, we consider PBEh-3c as the best performer for the LNCI16 set among all the composite QM methods tested in this work.

The discussed SQM methods typically required a computation time of several minutes and are therefore well suited for the presented system size, at least if not too many systems need to be computed (too slow for MD simulations or large-scale screening applications for the system sizes). Although the GFN2-xTB method requires the second largest wall time among all the tested SQM methods, it can still be considered as the best performer within this class of methods due to its exceptional accuracy. As intended, the GFN0-xTB method requires less computation time than the other GFN methods, but the significantly lower accuracy makes the application of GFN0-xTB unattractive for very large NCI systems. The PM7 method required roughly the same computation time as PM6-D3H4X, but considering the very large errors for the LNCI16 set, we cannot recommend the PM7 method for the treatment of very large NCI complexes.

Lastly, the performance of the tested force fields has been evaluated. These methods are designed for rapid calculations of systems significantly exceeding the sizes presented in this study. Hence, all the evaluated force fields only require a few seconds to compute the interaction energy of the Nylon complex. The GFN-FF force field obtained the best accuracy, but is a factor of 30 slower than the GAFF force field with only slightly worse accuracy. In addition, it is, to the best of our knowledge, the only FF besides UFF that is parameterized for the entire periodic table up to Rn. The computation time needed for the GFN2-xTB charges used for the GAFF calculations is not considered in the timings. Similarly, to obtain interaction energies with xTB-IFF, GFN1-xTB or GFN2-xTB, calculations of the monomers have to be performed first, which we have excluded from the discussion. However, since FF methods are usually used for the screening of many different NCI binding motifs of a given molecule, these costs can be considered negligible.


# 6

Conclusions

For the theoretical description of huge supramolecular systems, the development of efficient yet accurate computational methods to describe NCIs, most prominently ­London dispersion, represents a big challenge. For the evaluation of these methods, we compiled a new benchmark set called LNCI16, consisting of very large supramolecular complexes. We used the newly introduced ωB97X-3c efficient composite DFT method to calculate accurate reference energies for complexes with up to 2000 atoms in less than a month of computation time. We chose to calculate the interaction energies in the gas phase to allow a meaningful comparison of all methods, since there is no single solvation model that is implemented for all the methods tested.

By assessing various composite QM, SQM, and force field methods against high-level DFT data we were able to show that the majority of the investigated methods can describe these interactions with sufficient accuracy. Although the r2SCAN-3c method achieves the best results, it shows SCF convergence problems for large systems with small HOMO–LUMO gaps. For these cases, we recommend the more robust and nearly as accurate PBEh-3c low-cost DFT method. This method is suitable for calculating limited accurate single-point energies, e.g., in the last step of a screening workflow. Of the SQM methods, GFN2-xTB in particular has an excellent accuracy-to-cost ratio with a relMAD of 11.1% and a computation time of only 18 minutes for roughly 2000 atoms on a typical desktop computer. Therefore, GFN2-xTB can be used in screening applications with a limited number of structures, e.g., in a refinement step. Among the FF methods, GFN-FF yields the highest accuracy, whereas GAFF has the best accuracy-to-cost ratio. In applications of screening large structural databases or MD simulations, both are the method of choice. For uncharged systems, xTB-IFF is an excellent choice. Moreover, all the recommended methods can be calculated with freely available software (see the Supporting Information) and may be helpful in supramolecular synthesis planning. This is especially true for GFN-FF, as this force field is both robust and generally applicable, and provides interaction energies of large NCI complexes with reasonable accuracy and low computational resources.

Some popular methods show large deviations from the reference energies. The commonly used UFF force field predicts an unbound DNA helix, for example, and is therefore not recommended for computing large NCI complexes. The PM7 method is also unsuitable for this purpose as it drastically overbinds almost all complexes in the benchmark set.

The LNCI16 benchmark set may also serve as a fit or validation set for the parameterization of new SQM, FF, or machine-learning methods as well as (many-body) dispersion corrections.


#
#

Conflict of Interest

The authors declare no conflict of interest.

Acknowledgment

The authors thank M. Müller, T. Gasevic, and S. Ehlert for technical help regarding the calculations. We also thank Prof. Dr. S. Höger, Dr. O. Hollóczki, and Dr. M. de Wergifosse for providing structures.

Supporting Information


Corresponding Author

Andreas Hansen
Mulliken Center for Theoretical Chemistry, Institute for Physical and Theoretical Chemistry, University of Bonn
Beringstr. 4, 53115 Bonn
Germany   

Publication History

Received: 21 July 2022

Accepted after revision: 04 October 2022

Article published online:
30 November 2022

© 2022. Thieme. All rights reserved

Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany


Zoom Image
Figure 1 Structures and names of the complexes comprising the LNCI16 benchmark set (systems 3–7 from EXL8). Carbon atoms in the host molecules are colored light gray while they are depicted in light blue in the guest molecules.
Zoom Image
Figure 2 Relative contributions of the ATM/MBD-terms to the respective overall D4-ATM/MBD dispersion energies. The D4 correction with only pairwise terms is given as a reference (total dispersion energies are calculated as a sum of the pairwise and higher-order contributions). The lines connecting the data points are given for better visibility.
Zoom Image
Figure 3 Relative deviations (given in %) to ωB97X-3c of the tested composite QM methods. The lines connecting the data points are given for better visibility. * Only systems 18 and 1216, for which the SCF converged, were taken into consideration for determination of the relMADs.
Zoom Image
Figure 4 Relative deviations (given in %) from the ωB97X-3c reference values presented for the evaluated SQM methods that performed best (the PM7 and GFN0-xTB results can be found in the Supporting Information). The lines connecting the data points are given for better visibility.
Zoom Image
Figure 5 Relative deviations (given in %) from the ωB97X-3c reference values presented for the evaluated FF methods that performed best (MMFF94 and Ghemical results can be found in the Supporting Information). The lines connecting the data points are given for better visibility. a Using GFN2-xTB charges. b Using GFN1-xTB charges.
Zoom Image
Figure 6 Computational wall times (Intel® Xeon® E5-2660 v4 @ 2.00 GHz CPU) for the calculation of the interaction energies of the largest system in the LNCI16 set (Nylon), together with the respective relMADs for the complete benchmark set: (a) only converged systems were taken into account for the relMADs, (b) using GFN1-xTB charges, (c) using GFN2-xTB charges, (d) systems including boron were excluded from the statistical analysis, and (e) charged systems could not be calculated and were not considered in the respective relMAD. (Total wall times needed for the Nylon complex, host, and guest calculations were summed and multiplied by the number of cores used.)