Subscribe to RSS
DOI: 10.1055/a-1937-9113
A Novel Application of a Generation Model in Foreseeing ‘Future’ Reactions
This project was supported by the National Natural Science Foundation of China, (No.81903438) and Natural Science Foundation of Zhejiang Province (LD22H300004).
Abstract
Deep learning is widely used in chemistry and can rival human chemists in certain scenarios. Inspired by molecule generation in new drug discovery, we present a deep-learning-based approach to reaction generation with the Trans-VAE model. To examine how exploratory and innovative the model is in reaction generation, we constructed the dataset by time splitting. We used the Michael addition reaction as a generation vehicle and took these reactions reported before a certain date as the training set and explored whether the model could generate reactions that were reported after that date. We took 2010 and 2015 as time points for splitting the reported Michael addition reaction; among the generated reactions, 911 and 487 reactions were applied in the experiments after the respective split time points, accounting for 12.75% and 16.29% of all reported reactions after each time point. The generated results were in line with expectations and a large number of new, chemically feasible, Michael addition reactions were generated, which further demonstrated the ability of the Trans-VAE model to learn reaction rules. Our research provides a reference for the future discovery of novel reactions by using deep learning.
#
Key words
deep learning - artificial intelligence - reaction generation - Michael reaction - synthesis designOrganic synthesis is one of the most challenging processes in drug discovery, and the exploration of new organic reactions has always been a major stumbling block in the development of synthetic organic chemistry.[1] [2] New reactions enrich synthetic routes in the fields of chemistry and materials. Conventionally, most new reactions have been discovered by the application of chemical intuition by scientists, which is a complex task requiring a certain degree of luck. For instance, the products of Diels–Alder reactions were known to chemists as early as 1906, but it was not until 1950 that the reaction was applied in total-synthesis experiments.[3,4] The long and intricate progress of discovering new reactions hinders progress in drug discovery.[5,6]
When artificial intelligence (AI) was first applied to the field of chemistry, Maryasin and co-workers[7] discussed whether it might one day replace chemists and they examined the generality of AI. Over the past few years, AI technology has provided a number of important applications in various aspects of chemistry and has brought some disruptive effects.[8] [9] [10] [11] [12] [13] [14] [15] [16] Reaching or even surpassing human-level capability by combining chemical reactions with AI remains a new challenge with a broad range of feasible applications. The exploration of the application of AI to chemical reactions has primarily involved reaction prediction,[17,18] retrosynthetic analysis,[19,20] optimization of reaction conditions,[21] and reaction classification.[22]
In principle, reaction prediction can be realized by extracting the rules for various chemical reactions, and then directly deriving the relationship between products and reactants. The current mainstream methods usually treat reaction-prediction tasks as similarity transformations of molecular graphs or text translations, and the corresponding models are graph-convolutional neural-network and sequence-to-sequence models.[23] [24] The performance of text-based reaction prediction has been significantly improved by the release of the Switch Transformer AI model (Google Research, Brain Team), which is entirely based on an attention mechanism. The Molecular Transformer proposed by Schwaller et al.,[25] in which the molecules involved in a reaction are all represented in Simplified Molecular Input Line Entry System (SMILES) notation, is a state-of-the-art SMILES-based sequence-to-sequence model that can reach a 90.4% top-1 accuracy on the USPTO_MIT data set with separated reagents. In addition to innovations in the model structure, many strategies can assist AI in better comprehending chemical reactions, including data augmentation[26] and transfer learning,[27] which have shown satisfactory functions in tackling low-chemical-data regimes. However, the discovery of new reactions by automatically extracting rules from known chemical reactions is an arduous process.
Inspired by molecular generation, which involves the generation of undiscovered active or target molecules by extracting the characteristics from a set of molecules known to have specific biological activities, recent studies have turned their attention to generation models and have proposed de novo reaction generation. In the task of molecular generation, many active molecules aimed at a specific target have been successfully generated, and some molecules have already reached the clinical research stage, which provides an excellent reference for finding new reactions through generation models. Reactions generated by a reliable generation model can not only guide future chemical research, but can also provide a wealth of reaction data to drive deep-learning models. However, a chemical reaction, which implies a chemical transformation from reactants to products, is a more intricate object for a computer than a pure chemical compound that contains only SMILES rules and information on structural properties.
The first attempt at reaction generation was presented by Bort et al.,[28] who constructed bidirectional long short-term memory (LSTM) layers and trained their system on a database named USPTO. All reactions were modified from the original SMILES notation in the form of a corresponding condensed graph reaction (SMILES/CGR). By visualizing latent variables through generative topographic mapping, the researchers located the position of the Suzuki reaction and found some reactions that have particular structural motifs that were not present in the training data. In subsequent studies by Wang et al.,[29] the type of data set used for reaction generation was restricted to Heck reactions. Transformer XL, a fully attention-based model that is more suitable for long sequences, was applied in their study. An analysis of the results proved that the generated reaction conformed to the Heck reaction rule, and the model also had a favorable grasp of deeper chemical knowledge, such as site selectivity. They further selected some reactions for laboratory synthesis to verify the reliability of the generated reactions.
Is there a simple and efficient way to test the reliability of the generation model and the novelty of generated reactions? We have devised a scheme in which the model is trained with the chemical reactions reported in journals before a certain time point to test whether the model can reproduce the corresponding reactions reported after that time point. A schematic representation of this method is shown in Figure [1].
In this study, we used the classical Michael addition as a representative reaction for carbon-chain growth, carbon-ring formation, and heteroatom introduction in organic synthesis as a reaction-generation vehicle. The Trans-VAE[30] model, in which both an encoder and decoder are built with a transformer, was applied to accommodate the long sequence generation of the reaction. We imported the Michael addition reactions reported before a certain date into the model as the training input, and some of the reactions generated by our model were verified by chemists and reported in the literature after that date. The result proved the superiority of the model in certain aspects of reaction generation. More importantly, some of the generated reactions were novel Michael addition reactions that have never appeared in the literature and would be valuable for confirming chemical feasibility. The successful generation of Michael addition reactions, as supported by the literature, not only provides us with a simple and effective way to test the chemical-level generation model, but also sets the stage for generating new types of future reactions in the next phase of our work.


The primary purpose of reaction generation is to generate reactions that can be used in future research. As a secondary purpose, it can also be used to expand the volume of data for reactions having only a small data set to eliminate the resulting bottleneck in the application of deep-learning technology to chemistry. Obviously, it is more difficult to generate new reactions that meet the demands of researchers in the process of model generation. Taking 2010 as the time point for a split, we used a total of 3,218 reactions to train the model and we then generated 32,979 new Michael addition reactions, 911 of which have been reported in the literature since 2010 and validated experimentally. Similarly, when we divided the data at 2015 and fed the resulting 6,962 training reactions data into the model, we finally generated 81,377 new reactions, 487 of which have been reported in the literature since 2015. As listed in Table [1], we observed that the generated reactions reported after 2010 accounted for 12.75% of all reactions reported after that date, whereas when the split was set at 2015, the ratio was 16.29%. We also show the variation in the ratio with the progress of reaction generation in the Supplementary Information (SI; Figure S2); this indicates that the model-generated reactions are reliable and it provides a guarantee for the application of the remainder of the new reactions to chemical research in the future.
We randomly selected some of the model-generated Michael addition reactions that were reported after 2015, as well as some new examples of Michael addition reactions. Figures [2a–c] show model-generated reactions that have been applied in practical studies, whereas Figures [2d–f] are completely new reactions. These examples are consistent with the reaction-characteristics rule for Michael addition reaction. Considering the availability of experimental raw materials, we performed an experimental verification of the entries in Figures [2e] and 2f; specific experimental details are given in the SI. On the basis of the reaction generated from the dataset with 2015 as the split date, we evaluated the quality of model generation in terms of the distribution and similarity of the generated reactions to the training reactions and their chemical properties.
2010 |
2015 |
|
Generated reported reactions |
911 |
487 |
All reported reactions |
7,144 |
2,989 |
Rate (%) |
12.75 |
16.29 |


Because a complete reaction includes reactants and products and the chemical rule relating them, it is necessary to compare the component relationships between the training set and the generated set. As listed in Table [2], we counted the types of Michael acceptors and donors and the products in the training set and generated set. To represent the distribution between the generated set and the training set visually, we used the t-distributed stochastic neighbor embedding (t-SNE)[34] method to visualize the molecular Morgan fingerprints[35] in a low-dimensional space, and we further verified the validity of the generated molecules.
Figures [3]A–C show the t-SNE plots of the Michael donors, acceptors, and products in the generated set with the Morgan fingerprints of the corresponding reactants in the training set, respectively. It can be seen from the plots that the training-set molecules overlap well with the corresponding generated set, showing that both the reactant and product molecules generated by the model varied around the training set with a certain novelty and also fitted the distribution of the training set.


On shifting our attention to the overall reaction level, the process of combining the corresponding reactants and product molecules into a reaction means that the model must learn a Michael addition reaction rule. Although the Michael addition reaction is one of the most widely used catalytic C–C bond-forming tools in organic synthesis, its rule is complicated for the Trans-VAE model. To further demonstrate that the reactions generated by the model are Michael addition reactions, we used the thematic map package tmap [36] to visualize the reaction fingerprint (rxnfp) of the reactions. The rxnfp is derived from the reaction representation learned by the bidirectional encoder representations from transformers (BERT) reaction-classifier model. As shown in Figure [4], tmap connects reactions in the generated reactions (10,000 reactions randomly selected from the generated set) with those in the training dataset based on their rxnfp similarity, with each reaction represented as a point in a tree diagram. In addition, USPTO-50K, which contains ten major classes of chemical reactions curated by Liu et al.,[37] was downloaded and used to form the backbone of the chemical space. Furthermore, we used the uniform manifold approximation and projection for dimension reduction (UMAP)[38] to reduce the dimensionality of the rxnfp to validate the distribution of the training set and the generated set (Figure [3]D). The model grasped and reproduced the reaction rules in the training set relatively satisfactorily.
The Michael addition reaction is used in the preparation of complex compounds and has an important practical value. To explore in detail whether our model fully understands the Michael addition reaction, we performed an in-depth analysis of the generated Michael addition reaction set. First, we divided the Michael addition reaction into intermolecular and intramolecular reactions. If a molecule contains both donor and acceptor functional groups, intramolecular reactions can occur to form carbon rings or heterocycles. As listed in Table [3], there were 6,707 intermolecular reactions and 255 intramolecular reactions in the training dataset. Intermolecular reactions accounted for 99.6% of the generated reactions, which was consistent with the distribution of intermolecular reactions in the training set. Figure S3 in the SI shows several representative examples of intermolecular reactions and intramolecular reactions from the training and generated datasets. Because the Michael addition reaction is reversible, the thermodynamically most stable product usually predominates. Five- and six-membered rings are usually more stable due to their lower ring strains. Our model accurately captured this feature, and the intramolecular Michael addition reaction is mainly used for the synthesis of more-stable five- or six-membered rings.


Besides alkene Michael acceptors, electron-deficient alkynes conjugated with electron-withdrawing groups can also be used as Michael acceptors, although they are less reactive than their alkene counterparts. Table [4] shows the distribution of types of Michael acceptor divided into alkene acceptors and alkyne acceptors. Several Michael addition reactions of alkynes selected from the training set and the generated set are shown in the Figure S4 of the SI.
In the Michael addition reaction, a wide range of donor compounds are available. As shown in Figure S5 of the SI, molecules with activated C–H bonds attached to electron-withdrawing groups typically produce stable carbanions, and all of these molecules can be used as donors for Michael addition. In the case of a simple carbonyl compound with asymmetric Michael donors, the acceptor reacts mainly with the α-carbon atom having the more substituents, depending on the stability of the intermediate enol. In general, the greater the number of electron-donating substituents on the double bond, the more stable is the enol and the more the Michael addition reaction is promoted. Our model perceived this rule, and most of the generated reactions followed it satisfactorily, as depicted in Figure S6 of the SI.
For stable carbanions conjugated to multiple heteroatoms, reactions with an acceptor typically yield one to four addition products. Most heteroatom-containing stable groups are good leaving groups and can be considered as conjugated auxiliary groups. In Figure S7 of the SI, we list some instances in which it is obvious that a carbon atom between two carbonyl groups is more acidic than the carbon atoms on the other sides of the carbonyl groups and so is more likely to be deprotonated by a base to form a carbanion. The reactions generated by our model also fitted this signature.
The molecular structure of a Michael acceptor includes an electron-withdrawing group and an unsaturated system. Almost all alkenes substituted with an electron-withdrawing group can be used as Michael acceptors, as shown in Figure S8 of the SI. However, if the acceptor molecule contains two or more electron-withdrawing groups, the regioselectivity of the reaction is usually controlled by the more-active group. Figure S8 of the SI shows some generated reaction examples that conform to this rule. In Figure S8(A2) of the SI, the Michael acceptor has two electron-withdrawing groups: nitro and cyano. Because the nitro group is the more reactive, the reaction tends to yield the product shown in the scheme. The model is aware of this principle during training, and reflects it in the generated reaction.
It is worth mentioning that, in addition to carbon nucleophiles, some heteroatom groups can also be used as donors for the Michael addition reaction, due to their nucleophilic properties. For example, alkylamines or arylamines are widely used as Michael donors. The reaction has promising chemical selectivity and generally does not generate imine byproducts. We have added additional reaction data for the heteroatom Michael addition reaction to the data set with 2015 as the split date. After retraining the model on this data, we examined whether the model still grasped the reaction Michael addition when heteroatoms were present. In this study, we mainly considered heteroatoms such as N, S, and O. Table [5] lists the classification and proportions of heteroatom nucleophiles. It can be seen that the generated carbon, nitrogen, oxygen, and sulfur nucleophiles make up most of the generated reactants in a manner that is similar to the distribution of these four reactants in the training dataset. Examples of Michael additions involving heteroatoms in the training and generated sets are presented in Figure S9 of the SI. These results are exciting, as they prove that our Trans-VAE model is sufficiently expressive to produce correct reactions.
In summary, we have applied the Trans-VAE model to the task of reaction generation. To explore whether the model is capable of generating novel unreported Michael addition reactions, we simulated this scenario by dividing a dataset of existing reactions at a selected time point. Thanks to its transformer-based encoder and decoder architecture, the model captured both SMILES rules and features of the Michael addition reaction from a large sequence of reactions. We used 2010 as our first time point, and we trained the model on reactions reported before this date. The model then generated a set of reactions that were compared with those reported after 2010, and was found to reproduce 12.75% of all reactions reported after 2010, providing initial evidence of the reliability of our Trans-VAE model in reaction generation. To confirm the effectiveness (rather than randomness) of our model, we conducted another experiment with 2015 as the split-time point, and the hit rate was 16.29% in this case. We then further inspected whether the model mastered the rules of Michael addition reactions by analyzing the generated Michael addition reactions in terms of their chemical characteristics. Our final analysis showed that the model captures reaction characteristics in a manner consistent with the now-discovered chemical laws of Michael addition reactions, indicating the reliability of applying deep-learning models to reaction generation, and laying the foundation for our subsequent explorations of the vast chemical space by using deep-learning models and for the generation of completely new types of chemical reaction.
Methods: Dataset The reaction generation model was trained on SMILES files containing only Michael addition reactions that were extracted from the Reaxys database. During the data preprocessing, reactions where the SMILES string was invalid or the reactants and products were identical were removed from the file, and the remaining reactions were canonized using RDkit[39] so that the same compound was represented by the same SMILES. Finally, the non-compliant reactions were filtered based on the Michael addition reaction template using RDkit. As for the time point of the reaction, it was considered that the same reaction may be reported in the literature at different times, we took the time when it was first reported and deleted the rest of the same reactions to obtain a dataset containing 12,322 Michael addition reactions. Taking 2015 as the split line, the reactions before 2015 were divided into training and validation sets (9:1), while those after this time were used as a reference for whether the model could generate ‘future’ reactions.
Methods: Model Consider the fact that the SMILES representation of the reaction has increased by two to three times compared to the molecule, which requires the model to have salient performance for long sequences. Therefore, we applied the Trans-VAE model proposed by Dollar et al30 which implements the transformer as the encoder and decoder. The fundamental architecture of Trans-VAE consists of an encoder and a decoder. The encoder maps the discrete SMILES to a dense latent representation and transforms it into a continuous fixed-dimensional vector, while the decoder attempts to convert the vector in the latent space back into input with the smallest possible error. By adding noise to the encoded SMILES, molecules would have corresponding probability distribution in the latent space rather than individual points, and the decoder also learns to discover more robust representations from latent points. The training process intends to minimize the reconstruction loss between original SMILES and generative SMILES, while satisfying the probability distribution of the generated data is similar to that of the training data.
#
Conflict of Interest
The authors declare no conflict of interest.
Supporting Information
- Supporting information for this article is available online at https://doi-org.accesdistant.sorbonne-universite.fr/10.1055/a-1937-9113.
- Supporting Information
-
References and Notes
- 1 Davies IW. Nature 2019; 570: 175
- 2 Blakemore DC, Castro L, Churcher I, Rees DC, Thomas AW, Wilson DM, Wood A. Nat. Chem. 2018; 10: 383
- 3 Diels O, Alder K. Justus Liebigs Ann. Chem. 1928; 460: 98
- 4 Herges R. Tetrahedron Comput. Methodol. 1988; 1: 15
- 5 Ley SV, Fitzpatrick DE, Ingham RJ, Myers RM. Angew. Chem. Int. Ed. 2015; 54: 3449
- 6 Boström J, Brown DG, Young RJ, Keserü GM. Nat. Rev. Drug. Discovery 2018; 17: 709
- 7 Maryasin B, Marquetand P, Maulide N. Angew. Chem. Int. Ed. 2018; 57: 6978
- 8 Matlock MK, Hoffman M, Dang NL, Folmsbee DL, Langkamp LA, Hutchison GR, Kumar N, Sarullo K, Swanmidass SJ. J. Phys. Chem. A 2021; 125: 8978
- 9 Wu Z, Ramsundar B, Feinberg EN, Gomes J, Geniesse C, Pappu AS, Leswing K, Pande V. Chem. Sci. 2018; 9: 513
- 10 Debus B, Parastar H, Harrington P, Kirsanov D. Trends Anal. Chem. 2021; 145: 116459
- 11 Graziano G. Nat. Rev. Chem. 2020; 4: 564
- 12 Satoh H, Funatsu K. J. Chem. Inf. Comput. Sci. 1995; 35: 34
- 13 Zhang J, Norinder U, Svensson F. J. Chem. Inf. Model. 2021; 61: 2648
- 14 Ting K.-LH, Lee RC. T, Milne GW. A, Shapiro M, Guarino AM. Science 1973; 180: 417
- 15 Gong Y, Xue D, Chuai G, Yu J, Liu Q. Chem. Sci. 2021; 12: 14459
- 16 He H, Yan S, Lyu D, Xu MX, Ye RQ, Zheng P, Lu XY, Wang L, Ren B. Anal. Chem. 2021; 93: 3653
- 17 Fooshee D, Mood A, Gutman E, Tavakoli M, Urban G, Liu F, Huynh N, Van Vranken D, Baldi P. Mol. Syst. Des. Eng. 2018; 3: 442
- 18 Baylon JL, Cilfone NA, Gulcher JR, Chittenden TW. J. Chem. Inf. Model. 2019; 59: 673
- 19 Segler MH. S, Preuss M, Waller MP. Nature 2018; 555: 604
- 20 Dong J, Zhao M, Liu Y, Su Y, Zeng X. Briefings Bioinf. 2021; 23; bbab391
- 21 Kim HW, Lee SW, Na GS, Han SJ, Kim SK, Shin JH, Chang H, Kim YT. React. Chem. Eng. 2021; 6: 235
- 22 Xu X, Gu H, Wang Y, Wang J, Qin P. Front. Genet. 2019; online
- 23 Jin W, Coley CW, Barzilay R, Jaakkola T. arXiv 2017; 1709.04555
- 24 Coley CW, Jin W, Rogers L, Jamison TF, Jaakkola TS, Green WH, Barzilay R, Jensen KF. Chem. Sci. 2019; 10: 370
- 25 Schwaller P, Laino T, Gaudin T, Bolgar P, Hunter CA, Bekas C, Lee AA. ACS Cent. Sci. 2019; 5: 1572
- 26 Cortes-Ciriano I, Bender A. J. Chem. Inf. Model. 2015; 55: 2682
- 27 Shi T, Huang S, Chen L, Heng Y, Kuang ZY, Xu L, Mei H. Chemom. Intell. Lab. Syst. 2020; 205: 104122
- 28 Bort W, Baskin II, Gimadiev T, Mukanov A, Nugmanov R, Sidorov P, Marcou G, Horvath D, Klimchuk O, Madzhidov T, Varnek A. Sci. Rep. 2021; 11: 3178
- 29 Wang X, Yao C, Zhang Y, Yu J, Qiao H, Zhang C, Wu Y, Bai R, Duan H. ChemRxiv 2021; preprint, DOI
- 30 Dollar O, Joshi N, Beck DA. C, Pfaendtner J. Chem. Sci. 2021; 12: 8362
- 31 Payra S, Saha A, Banerjee S. RSC Adv. 2016; 6: 95951
- 32 Gorde AB, Ramapanicker R. Eur. J. Org. Chem. 2019; 4745
- 33 Wang A, Lv K, Tao Z, Gu J, Liu MJ, Wan BJ, Franzblau SG, Ma C, Ma X, Han B, Wang A, Xu S, Lu Y. Eur. J. Med. Chem. 2019; 181: 111595
- 34 van der Maaten L, Hinton G. J. Mach. Learn. Res. 2008; 9: 2579
- 35 Rogers D, Hahn M. J. Chem. Inf. Model. 2010; 50: 742
- 36 Tennekes M. J. Stat. Software 2018; 84 (06) 1
- 37 Liu B, Ramsundar B, Kawthekar P, Shi J, Gomes J, Nguyen QL, Ho S, Sloane J, Wender P, Pande V. ACS Cent. Sci. 2017; 3: 1103
- 38 McInnes L, Healy J, Melville J. arXiv 2018; 1802.03426
- 39 Landrum, G. RDKit: Open-source cheminformatics (accessed Sept. 28, 2022): http://www.rdkit.org
Corresponding Authors
Publication History
Received: 14 May 2022
Accepted after revision: 06 September 2022
Accepted Manuscript online:
06 September 2022
Article published online:
07 October 2022
© 2022. Thieme. All rights reserved
Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany
-
References and Notes
- 1 Davies IW. Nature 2019; 570: 175
- 2 Blakemore DC, Castro L, Churcher I, Rees DC, Thomas AW, Wilson DM, Wood A. Nat. Chem. 2018; 10: 383
- 3 Diels O, Alder K. Justus Liebigs Ann. Chem. 1928; 460: 98
- 4 Herges R. Tetrahedron Comput. Methodol. 1988; 1: 15
- 5 Ley SV, Fitzpatrick DE, Ingham RJ, Myers RM. Angew. Chem. Int. Ed. 2015; 54: 3449
- 6 Boström J, Brown DG, Young RJ, Keserü GM. Nat. Rev. Drug. Discovery 2018; 17: 709
- 7 Maryasin B, Marquetand P, Maulide N. Angew. Chem. Int. Ed. 2018; 57: 6978
- 8 Matlock MK, Hoffman M, Dang NL, Folmsbee DL, Langkamp LA, Hutchison GR, Kumar N, Sarullo K, Swanmidass SJ. J. Phys. Chem. A 2021; 125: 8978
- 9 Wu Z, Ramsundar B, Feinberg EN, Gomes J, Geniesse C, Pappu AS, Leswing K, Pande V. Chem. Sci. 2018; 9: 513
- 10 Debus B, Parastar H, Harrington P, Kirsanov D. Trends Anal. Chem. 2021; 145: 116459
- 11 Graziano G. Nat. Rev. Chem. 2020; 4: 564
- 12 Satoh H, Funatsu K. J. Chem. Inf. Comput. Sci. 1995; 35: 34
- 13 Zhang J, Norinder U, Svensson F. J. Chem. Inf. Model. 2021; 61: 2648
- 14 Ting K.-LH, Lee RC. T, Milne GW. A, Shapiro M, Guarino AM. Science 1973; 180: 417
- 15 Gong Y, Xue D, Chuai G, Yu J, Liu Q. Chem. Sci. 2021; 12: 14459
- 16 He H, Yan S, Lyu D, Xu MX, Ye RQ, Zheng P, Lu XY, Wang L, Ren B. Anal. Chem. 2021; 93: 3653
- 17 Fooshee D, Mood A, Gutman E, Tavakoli M, Urban G, Liu F, Huynh N, Van Vranken D, Baldi P. Mol. Syst. Des. Eng. 2018; 3: 442
- 18 Baylon JL, Cilfone NA, Gulcher JR, Chittenden TW. J. Chem. Inf. Model. 2019; 59: 673
- 19 Segler MH. S, Preuss M, Waller MP. Nature 2018; 555: 604
- 20 Dong J, Zhao M, Liu Y, Su Y, Zeng X. Briefings Bioinf. 2021; 23; bbab391
- 21 Kim HW, Lee SW, Na GS, Han SJ, Kim SK, Shin JH, Chang H, Kim YT. React. Chem. Eng. 2021; 6: 235
- 22 Xu X, Gu H, Wang Y, Wang J, Qin P. Front. Genet. 2019; online
- 23 Jin W, Coley CW, Barzilay R, Jaakkola T. arXiv 2017; 1709.04555
- 24 Coley CW, Jin W, Rogers L, Jamison TF, Jaakkola TS, Green WH, Barzilay R, Jensen KF. Chem. Sci. 2019; 10: 370
- 25 Schwaller P, Laino T, Gaudin T, Bolgar P, Hunter CA, Bekas C, Lee AA. ACS Cent. Sci. 2019; 5: 1572
- 26 Cortes-Ciriano I, Bender A. J. Chem. Inf. Model. 2015; 55: 2682
- 27 Shi T, Huang S, Chen L, Heng Y, Kuang ZY, Xu L, Mei H. Chemom. Intell. Lab. Syst. 2020; 205: 104122
- 28 Bort W, Baskin II, Gimadiev T, Mukanov A, Nugmanov R, Sidorov P, Marcou G, Horvath D, Klimchuk O, Madzhidov T, Varnek A. Sci. Rep. 2021; 11: 3178
- 29 Wang X, Yao C, Zhang Y, Yu J, Qiao H, Zhang C, Wu Y, Bai R, Duan H. ChemRxiv 2021; preprint, DOI
- 30 Dollar O, Joshi N, Beck DA. C, Pfaendtner J. Chem. Sci. 2021; 12: 8362
- 31 Payra S, Saha A, Banerjee S. RSC Adv. 2016; 6: 95951
- 32 Gorde AB, Ramapanicker R. Eur. J. Org. Chem. 2019; 4745
- 33 Wang A, Lv K, Tao Z, Gu J, Liu MJ, Wan BJ, Franzblau SG, Ma C, Ma X, Han B, Wang A, Xu S, Lu Y. Eur. J. Med. Chem. 2019; 181: 111595
- 34 van der Maaten L, Hinton G. J. Mach. Learn. Res. 2008; 9: 2579
- 35 Rogers D, Hahn M. J. Chem. Inf. Model. 2010; 50: 742
- 36 Tennekes M. J. Stat. Software 2018; 84 (06) 1
- 37 Liu B, Ramsundar B, Kawthekar P, Shi J, Gomes J, Nguyen QL, Ho S, Sloane J, Wender P, Pande V. ACS Cent. Sci. 2017; 3: 1103
- 38 McInnes L, Healy J, Melville J. arXiv 2018; 1802.03426
- 39 Landrum, G. RDKit: Open-source cheminformatics (accessed Sept. 28, 2022): http://www.rdkit.org







