Toward Automatic Inference of Glycan Linkages Using MSn and Machine Learning - Proof of Concept Using Sialic Acid Linkages
Xinyi Ni, Nathan B. Murray, Stephanie Archer-Hartmann, Lauren E. Pepi, Richard F. Helm, Parastoo Azadi, and Pengyu Hong
Glycosidic linkages in oligosaccharides play essential roles in determining their chemical properties and biological activities. MSn has been widely used to infer glycosidic linkages, but requires a substantial amount of starting material, which limits its application. In addition, there is a lack of rigorous research on what MSn protocols are proper for characterizing glycosidic linkages. In this work, to deliver high-quality experimental data and analysis results, we propose a machine learning based framework to establish appropriate MSn protocols and build effective data analysis methods. We demonstrate the proof-of-principle by applying our approach to elucidate sialic acid linkages (α2’-3’ and α2’-6’) in a set of sialyllactose standards and NIST sialic acid containing N-glycans, as well as identify several protocol configurations for producing high-quality experimental data. Our companion data analysis method achieves nearly 100% accuracy in classifying α2’-3’ vs α2’-6’ using MS5, MS4, MS3, or even MS2 spectra alone. The ability to determine glycosidic linkages using MS2 or MS3 is significant, as it requires substantially less sample, enabling linkage analysis for quantity-limited natural glycans and synthesized materials, as well as shortens over-all experimental time. MS2 is also more amenable than MS3/4/5 to automation when coupled to direct infusion or LC-MS. Additionally, our method can predict the ratio of α2’-3’ and α2’-6’ in a mixture with 8.6% RMSE (root mean square error) across datasets using MS5 spectra. We anticipate that our framework will be generally applicable to analysis of other glycosidic linkages.
Lectin Fingerprinting Distinguishes Antibody Neutralization in SARS-CoV‑2
Peculiar Phosphonate Modifications of Velvet Worm Slime Revealed by Advanced Nuclear Magnetic Resonance and Mass Spectrometry
The Sea Cucumber Thyonella gemmata Contains a Low Anticoagulant Sulfated Fucan with High Anti-SARS-CoV-2 Actions against Wild-Type and Delta Variants
File Name | File Description | File Type | File Size | File URL |
---|---|---|---|---|
Preprocessed MSn Spectra | This zip file contains the preprocessed MSn spectra in the CSV format. See the paper for the details of preprocessing. A "readme.docx" file is included to explain how the data files are organized. | zip | 42.21 MB | Login to download |
NIST raw MSn | The zip file contains the raw MSn data files in the NIST dataset. A readme file is included to explain how data files are organized. | zip | 414.7 MB | Login to download |
UGA-S raw MSn | The zip file contains the raw MSn data files in the UGA-S dataset. A readme file is included to explain how data files are organized. | zip | 74.32 MB | Login to download |
UGA-R raw MSn | The zip file contains the raw MSn data files in the UGA-R dataset. A readme file is included to explain how data files are organized. | zip | 226.96 MB | Login to download |