Kathleen M Lewis1,2 Srivatsan Varadharajan1 Ira Kemelmacher-Shlizerman1,3 1Google Research 2MIT CSAIL 3University of Washington
Given an image of a target person and an image of another person wearing a garment, we automatically generate the target person in the given garment. At the core of our method is a pose-conditioned StyleGAN2 latent space interpolation, which seamlessly combines the areas of interest from each image, i.e., body shape, hair, and skin color are derived from the target person, while the garment with its folds, material properties, and shape comes from the garment image. By automatically optimizing for interpolation coefficients in the latent space and per layer, we can perform a seamless, yet true to source, merging of the garment and target person. Our algorithm allows for garments to deform according to the given body shape, while preserving pattern and material details. Experiments demonstrate state-of-the-art photo-realistic results at high resolution (512x512).
We train a pose-conditioned StyleGAN2 network that outputs RGB images and segmentations.
After training our modified StyleGAN2 network, we run an optimization method to learn interpolation coefficients for each style block. These interpolation coefficients are used to combine style codes of two different images and semantically transfer a region of interest from one image to another. This method can be used for generated StyleGAN2 images or on real images by first projecting the real images into the latent space.
VOGUE can transfer garments between different poses and body shapes. It preserves garment details (shape, pattern, color, texture) and person identity (hair, skin color, pose).
Virtual try-on between two real images is possible by first projecting the two images into the StyleGAN Z+ latent space. Improving projection is an active area of research.
Wang, Bochao, et al. "Toward characteristic-preserving image-based virtual try-on network." Proceedings of the European Conference on Computer Vision (ECCV). 2018.
Men, Yifang, et al. "Controllable person image synthesis with attribute-decomposed gan." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020.
@article{lewis2020vogue, author = {Lewis, Kathleen M and Varadharajan, Srivatsan and Kemelmacher-Shlizerman, Ira}, title = {VOGUE: Try-On by StyleGAN Interpolation Optimization}, journal = {arXiv preprint arXiv:2101.02285} year = {2021} }
Katie Lewis kmlewis@mit.edu