Toward Automatic Inference of Glycan Linkages Using MSn and Machine Learning - Proof of Concept Using Sialic Acid Linkages

Authors:

Xinyi Ni, Nathan B. Murray, Stephanie Archer-Hartmann, Lauren E. Pepi, Richard F. Helm, Parastoo Azadi, and Pengyu Hong

Affiliation:

  1. Brandeis University
  2. University of Georgia, Athens
  3. Virginia Tech

Description:

Glycosidic linkages in oligosaccharides play essential roles in determining their chemical properties and biological activities. MSn has been widely used to infer glycosidic linkages, but requires a substantial amount of starting material, which limits its application. In addition, there is a lack of rigorous research on what MSn protocols are proper for characterizing glycosidic linkages. In this work, to deliver high-quality experimental data and analysis results, we propose a machine learning based framework to establish appropriate MSn protocols and build effective data analysis methods. We demonstrate the proof-of-principle by applying our approach to elucidate sialic acid linkages (α2’-3’ and α2’-6’) in a set of sialyllactose standards and NIST sialic acid containing N-glycans, as well as identify several protocol configurations for producing high-quality experimental data. Our companion data analysis method achieves nearly 100% accuracy in classifying α2’-3’ vs α2’-6’ using MS5, MS4, MS3, or even MS2 spectra alone. The ability to determine glycosidic linkages using MS2 or MS3 is significant, as it requires substantially less sample, enabling linkage analysis for quantity-limited natural glycans and synthesized materials, as well as shortens over-all experimental time. MS2 is also more amenable than MS3/4/5 to automation when coupled to direct infusion or LC-MS. Additionally, our method can predict the ratio of α2’-3’ and α2’-6’ in a mixture with 8.6% RMSE (root mean square error) across datasets using MS5 spectra. We anticipate that our framework will be generally applicable to analysis of other glycosidic linkages.

Publications:

  • Xinyi Ni, Nathan B. Murray, Stephanie Archer-Hartmann, Lauren E. Pepi, Richard Helm, Parastoo Azadi, Pengyu Hong; Toward Automatic Inference of Glycan Linkages Using MSn and Machine Learning - Proof of Concept Using Sialic Acid Linkages; Journal of the American Society for Mass Spectrometry, 2023
  • Tags:

    Carbohydrates
    Chemical biology
    Diagnostic imaging
    Machine learning
    Quality management

    Files:

    File Name File Description File Type File Size File URL
    Preprocessed MSn Spectra This zip file contains the preprocessed MSn spectra in the CSV format. See the paper for the details of preprocessing. A "readme.docx" file is included to explain how the data files are organized. zip 42.21 MB Login to download
    NIST raw MSn The zip file contains the raw MSn data files in the NIST dataset. A readme file is included to explain how data files are organized. zip 414.7 MB Login to download
    UGA-S raw MSn The zip file contains the raw MSn data files in the UGA-S dataset. A readme file is included to explain how data files are organized. zip 74.32 MB Login to download
    UGA-R raw MSn The zip file contains the raw MSn data files in the UGA-R dataset. A readme file is included to explain how data files are organized. zip 226.96 MB Login to download