Predicting the stereoselectivity of chemical reactions by composite machine learning method

Authors:

Jihoon Chung1, Justin Li2, Amirul Islam Saimon3, Pengyu Hong4, & Zhenyu Kong3

Affiliation:

1Department of Industrial Engineering, Pusan National University, Busan, Korea.

2Management, Entrepreneurship, and Technology, University of California, Berkeley, CA, USA.

3Grado Department of Industrial and Systems Engineering, Virginia Tech, Blacksburg, VA, USA.

4Department of Computer Science, Brandeis University, Waltham, MA, USA.

Description:

Stereoselective reactions have played a vital role in the emergence of life, evolution, human biology, and medicine. However, for a long time, most industrial and academic efforts followed a trial-and-error approach for asymmetric synthesis in stereoselective reactions. In addition, most previous studies have been qualitatively focused on the influence of steric and electronic effects on stereoselective reactions. Therefore, quantitatively understanding the stereoselectivity of a given chemical reaction is extremely difficult. As proof of principle, this paper develops a novel composite machine learning method for quantitatively predicting the enantioselectivity representing the degree to which one enantiomer is preferentially produced from the reactions. Specifically, machine learning methods that are widely used in data analytics, including Random Forest, Support Vector Regression, and LASSO, are utilized. In addition, the Bayesian optimization and permutation importance tests are provided for an in-depth understanding of reactions and accurate prediction. Finally, the proposed composite method approximates the key features of the available reactions by using Gaussian mixture models, which provide suitable machine learning methods for new reactions. The case studies using the real stereoselective reactions show that the proposed method is effective and provides a solid foundation for further application to other chemical reactions.

Publications:

  • Jihoon Chung, Justin Li, Amirul Islam Saimon, Pengyu Hong*, & Zhenyu Kong*; Predicting the stereoselectivity of chemical reactions by composite machine learning method; Scientific Reports, 2024
  • Tags:

    Catalysts
    Machine learning

    Related Projects:

    No related projects available


    Files:

    File Name File Description File Type File Size File URL
    Reaction information This file provides details on different Chiral Phosphoric Acid (CPA) reactions. zip 175.57 KB Login to download
    Others This file contains different combinations of the dependent and independent variables (more specifically, CPA reaction features and response variables) related to the analysis in the published paper (DOI: https://doi.org/10.1038/s41598-024-62158-0). zip 7.57 MB Login to download
    Machine Learning code scripts This file contains machine learning (ML) code scripts. Each script name in the zipped folder includes method names, table numbers, and figure numbers that correspond to those in the paper (DOI: https://doi.org/10.1038/s41598-024-62158-0). zip 402.62 KB Login to download