Imbalanced spectral data analysis using data augmentation based on the generative adversarial network

Authors:

Jihoon Chung1, Junru Zhang2, Amirul Islam Saimon2, Yang Liu2, Blake N. Johnson2* & Zhenyu Kong2*

Affiliation:

1Department of Industrial Engineering, Pusan National University, Busan, South Korea

2Grado Department of Industrial and Systems Engineering, Virginia Tech, Blacksburg, VA, USA

Description:

Spectroscopic techniques generate one-dimensional spectra with distinct peaks and specific widths in the frequency domain. These features act as unique identities for material characteristics. Deep neural networks (DNNs) has recently been considered a powerful tool for automatically categorizing experimental spectra data by supervised classification to evaluate material characteristics. However, most existing work assumes balanced spectral data among various classes in the training data, contrary to actual experiments, where the spectral data is usually imbalanced. The imbalanced training data deteriorates the supervised classification performance, hindering understanding of the phase behavior, specifically, sol-gel transition (gelation) of soft materials and glycomaterials. To address this issue, this paper applies a novel data augmentation method based on a generative adversarial network (GAN) proposed by the authors in their prior work. To demonstrate the effectiveness of the proposed method, the actual imbalanced spectral data from Pluronic F-127 hydrogel and Alpha-Cyclodextrin hydrogel are used to classify the phases of data. Specifically, our approach improves 8.8%, 6.4%, and 6.2% of the performance of the existing data augmentation methods regarding the classifier’s F-score, Precision, and Recall on average, respectively. Specifically, our method consists of three DNNs: the generator, discriminator, and classifier. The method generates samples that are not only authentic but emphasize the differentiation between material characteristics to provide balanced training data, improving the classification results. Based on these validated results, we expect the method’s broader applications in addressing imbalanced measurement data across diverse domains in materials science and chemical engineering.

Publications:

  • Jihoon Chung, Junru Zhang, Amirul Islam Saimon, Yang Liu, Blake N. Johnson& Zhenyu Kong; Imbalanced spectral data analysis using data augmentation based on the generative adversarial network; Scientific Reports, 2024
  • Tags:

    Automated inspection
    Automated sensing
    High-throughput characterization
    Hydrogels
    Machine learning

    Files:

    File Name File Description File Type File Size File URL
    Pluronic F‑127 hydrogel dataset Pluronic F-127 (PF-127), a nonionic amphiphilic surfactant, demonstrates a reversible thermogelling process in aqueous solutions, resembling the behavior observed in other Pluronic compounds. In the published paper titled "Imbalanced spectral data analysis using data augmentation based on the generative adversarial network", PF-127 hydrogel libraries are used for the case study. It’s been widely used and studied in a wide range of applications. 96 PF-127 deionized water mixtures with different mass ratios are formulated in the 96-well plates. The concentration of PF-127 deionized water varies from 0.3125 to 30 wt% with an increment of 0.3125 wt%. The phase angle-frequency spectrum of each sample is collected by a sensor-based high-throughput method. The collected spectra are labeled as solution or gel to study the composition-property relationships of PF-127 hydrogels. Three repeated experiments provide 288 spectral data. Specifically, 181 spectral data of solution (Fig. 1a in the mentioned paper) and 107 of gel (Fig. 1b in the mentioned paper) are utilized for the case study. The frequency range for each experiment and concentration is determined by the spectrum width. Moreover, different sensors are employed in repeated experiments, resulting in diverse spectrum frequency ranges. To use all the spectrum data from three experiments, the x-axis of spectrum data is converted into the sequence of sensor measurements (from one to eight hundred, which is the length of data). The detailed data collection procedure and frequency range of each experiment are described in “Data collection of Pluronic F-127 hydrogel libraries” section of the above-mentioned paper. zip 2.03 MB Login to download