Generative Adversarial Networks (GANs) are constantly improving year over the year. In October 2021, NVIDIA presented a new model, StyleGAN3, that outperforms StyleGAN2 with its hierarchical refinement. The new model resolves “sticking issues” of StyleGAN2 and learns to mimic camera motion. Moreover, StyleGAN3 promises to improve the models for video and animation generation.
That’s impressive progress, compared to 2014 when GANs entered the picture with low-resolution images. We are also witnessing applications beyond simple images generation. They include but are not limited to: medical products, training data, scientific simulation development, improvements for augmented reality (AR) experience, and speech enhancement and generation. Let’s delve into the most impressive applications we’ve got so far!
Lots of articles illustrate the GANs abilities when it comes to image generation and editing. You’ve probably read about functions like face aging, text-to-image translation, frontal face view or pose generation, and so on. As a starter, we recommend 18 Impressive Applications of Generative Adversarial Networks (GANs) and Best Resources for Getting Started With GANs by Jason Brownlee. In this article, we’d like to delve into the latest applications of GANs in real life to go deeper.
Labeling medical datasets is expensive and time-consuming, but it seems like GANs have something to offer. Since GANs predominantly belong to the data augmentation techniques, we'd like to dwell on the latest updates in healthcare. Data augmentation is about GANs helping computer vision professionals struggling with class imbalance, leading to biased models while training datasets. Data augmentation, ensured by GANs, helps fight overfitting and the inability to generalize novel examples. So this is how GANs are increasing performance for underrepresented classes of chest X-ray classification, as per the research of Sundaram et al. in 2021. They proved that GANs-based data augmentation is more efficient than standard data augmentation. Meanwhile, researchers point out that GAN data augmentation was most effective when applied to small, significantly imbalanced datasets. Also, it has a limited impact on large datasets. Also, researchers from the University of New Hampshire, in the US, demonstrated that GANs-based data augmentation is beneficial for neuroimaging. Functional near-infrared spectroscopy (fNIRS) belongs to the neuroimaging techniques for mapping the functioning human cortex. By the way, fNIRS applies to brain-computer interfaces, so a large amount of new data for deep learning classification training is crucial. Conditional Generative Adversarial Networks (CGAN), combined with a CNN classifier, led to the 96.67% task classification accuracy, as per the research of Sajila D. Wickramaratne and Shaad Mahmud in 2021.
Researchers from the University of California, Berkeley and Glidewell Dental Labs presented one of the first real applications for medical product development. With the help of generative models, dental crowns can be designed to reach the same morphology quality as dental experts do. It takes years of training to develop synthetic crowns for a professional in the dental industry. Thus, it paves the way for the mass customization of products in the healthcare industry. At the same time, GANs come as a good fit for super-resolution medical imaging like low dose Computer Tomography (CT), low field magnetic resonance imaging (MRI). GANs-based method, proposed in 2020, Medical Images SR using Generative Adversarial Networks (MedSRGAN) increase radiologists' efficiency. Thus, it helps to increase the quality of scans and avoid harmful effects this procedure may bring.
Automatic speech recognition (ASR) is one of the areas of our expertise. Speech enhancement GANs (SEGAN) apply to the noisy inputs to refine them and make a qualitative output. This function is crucial for people with speech impairments, for example. Thus, GANs could enhance their quality of life. Recently, Huy Phan et al. proposed using “multiple generators that are chained to perform a multi-stage enhancement.” As researchers state, new models, ISEGAN and DSEGAN, are performing better than SEGAN.
GANs also come in handy for augmented reality (AR) scenes with creative generation capabilities. For example, recent use cases include completing environmental maps with lightning, reflections, and shadows. Thus, ARShadowGAN, presented in 2020 by Daquan Liu et al., generates shadows of the virtual objects in single light scenes. This technology bridges the real-world environment and the virtual object’s shadow without 3D geometric details or any explicit estimation of illumination:
When it comes to advertising, the phrase “time is money” means a lot. One of our use cases implied automated advertising images generation at scale. For example, it costs time and money for a designer to resize images for marketing campaigns from socials to Amazon platforms. However, Super-Resolution Using a Generative Adversarial Network (SRGAN) for single image super-resolution can deal with it. As a result, using SRGAN, it’s possible to resize qualitative images without any human interaction with design.
“What could be better than data for a data scientist? More data!” This joke became a real application thanks to GANs. As any neural network-based model is hungry for training data, generative models that could create labeled training data on demand could become a game-changer. For instance, Zhenghao Fei et al. (University of California, Davis) demonstrated how semantically constrained GAN (CycleGAN plus task constrained network) can eliminate the labor-, cost-, and time-consuming process of data labeling. Thus, it ensures more data-efficient and generalizable fruit detection. Simply put, semantically constrained GAN can generate realistic day and night images of grapevine from 3D rendering images and retain grape position and size simultaneously. Labeled data generation could be beneficial in the NLP domain — supporting the research of low resource languages. For instance, Sangramsing Kayte used GANs for text-to-speech translation of low-resource languages of the Indian subcontinent.
Recent research shows that ML models can leak sensitive information provided by the training samples. For example, the paper This Person (Probably) Exists. Identity Membership Attacks Against GAN Generated (2021) illustrates that many images of faces produced by GANs strongly resemble the real faces taken from the training data. Researchers propose differential privacy that could help networks learn the data distribution while securing the training data’s privacy.
GANs demonstrated impressive progress, compared to 2014, when introduced first by Ian Goodfellow. Despite still being in its infancy, we are already witnessing how GANs improve the design of medical products, automate image editing for advertising, and merge with AR technology. At the same time, the privacy of data generated remains topical, and differential privacy is one of the techniques to consider.