Image and Video Synthesis: Stable Diffusion, VQGAN, V-UNET, Etc.

Deep generative models have matured to the point where they are transforming the way we create visual content. We explore powerful generative approaches such as Invertible Neural Networks, autoregressive Transformers, and Diffusion Models. We investigate their specific limitations to develop novel strategies that unleash the full potential of these architectures. Among others, this led to latent approaches such as VQGAN and Stable Diffusion and the disentanglement of shape and appearance in V-UNET. Our long-standing goal is to develop algorithms that make images accessible on a semantic level to simplify our interaction with computers and to democratize the availability of this enabling technology.

Talk given in August 2021

Selected Publications

2022

Blattmann, Andreas; Rombach, Robin; Oktay, Kaan; Ommer, Björn

Retrieval-Augmented Diffusion Models Conference

Neural Information Processing Systems (NeurIPS), 2022., 2022.

Links | BibTeX

Rombach, Robin; Blattmann, Andreas; Ommer, Björn

Text-Guided Synthesis of Artistic Images with Retrieval-Augmented Diffusion Models Conference

Proceedings of the European Conference on Computer Vision (ECCV) Workshop on Visart, 2022.

Links | BibTeX

Rombach, Robin; Blattmann, Andreas; Lorenz, Dominik; Esser, Patrick; Ommer, Björn

High-Resolution Image Synthesis with Latent Diffusion Models Conference

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.

Links | BibTeX

2021

ImageBART: Bidirectional Context with Multinomial Diffusion for Autoregressive Image Synthesis

Esser, Patrick; Rombach, Robin; Blattmann, Andreas; Ommer, Björn

ImageBART: Bidirectional Context with Multinomial Diffusion for Autoregressive Image Synthesis Conference

Neural Information Processing Systems (NeurIPS), 2021.

Links | BibTeX

iPOKE: Poking a Still Image for Controlled Stochastic Video Synthesis

Blattmann, Andreas; Milbich, Timo; Dorkenwald, Michael; Ommer, Björn

iPOKE: Poking a Still Image for Controlled Stochastic Video Synthesis Conference

Proceedings of the International Conference on Computer Vision (ICCV), 2021.

Links | BibTeX

Stochastic Image-to-Video Synthesis using cINNs

Dorkenwald, Michael; Milbich, Timo; Blattmann, Andreas; Rombach, Robin; Derpanis, Konstantinos G.; Ommer, Björn

Stochastic Image-to-Video Synthesis using cINNs Conference

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021.

Links | BibTeX

Behavior-Driven Synthesis of Human Dynamics

Blattmann, Andreas; Milbich, Timo; Dorkenwald, Michael; Ommer, Björn

Behavior-Driven Synthesis of Human Dynamics Conference

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021.

Links | BibTeX

Taming Transformers for High-Resolution Image Synthesis

Esser, Patrick; Rombach, Robin; Ommer, Björn

Taming Transformers for High-Resolution Image Synthesis Conference

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021.

Links | BibTeX

Geometry-Free View Synthesis: Transformers and no 3D Priors

Rombach, Robin; Esser, Patrick; Ommer, Björn

Geometry-Free View Synthesis: Transformers and no 3D Priors Conference

Proceedings of the Intl. Conf. on Computer Vision (ICCV), 2021.

Links | BibTeX

Understanding Object Dynamics for Interactive Image-to-Video Synthesis

Blattmann, Andreas; Milbich, Timo; Dorkenwald, Michael; Ommer, Björn

Understanding Object Dynamics for Interactive Image-to-Video Synthesis Conference

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021.

Abstract | Links | BibTeX

Unsupervised behaviour analysis and magnification (uBAM) using deep learning

Brattoli, Biagio; Büchler, Uta; Dorkenwald, Michael; Reiser, Philipp; Filli, Linard; Helmchen, Fritjof; Wahl, Anna-Sophia; Ommer, Björn

Unsupervised behaviour analysis and magnification (uBAM) using deep learning Journal Article

In: Nature Machine Intelligence, 2021.

Abstract | Links | BibTeX

Jahn, Manuel; Rombach, Robin; Ommer, Björn

High-Resolution Complex Scene Synthesis with Transformers Conference

CVPR 2021, AI for Content Creation Workshop, 2021.

Abstract | Links | BibTeX

Afifi, Mahmoud; Derpanis, Konstantinos G; Ommer, Björn; Brown, Michael S

Learning Multi-Scale Photo Exposure Correction Conference

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021.

Links | BibTeX

Kotovenko, Dmytro; Wright, Matthias; Heimbrecht, Arthur; Ommer, Björn

Rethinking Style Transfer: From Pixels to Parameterized Brushstrokes Conference

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021.

Abstract | Links | BibTeX

2020

Unsupervised Magnification of Posture Deviations Across Subjects

Dorkenwald, Michael; Büchler, Uta; Ommer, Björn

Unsupervised Magnification of Posture Deviations Across Subjects Conference

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.

Links | BibTeX

Esser, Patrick; Rombach, Robin; Ommer, Björn

A Note on Data Biases in Generative Models Conference

NeurIPS 2020 Workshop on Machine Learning for Creativity and Design, 2020.

Abstract | Links | BibTeX

Unsupervised Part Discovery by Unsupervised Disentanglement

Braun, Sandro; Esser, Patrick; Ommer, Björn

Unsupervised Part Discovery by Unsupervised Disentanglement Conference

Proceedings of the German Conference on Pattern Recognition (GCPR) (Oral), Tübingen, 2020.

Abstract | Links | BibTeX

A Disentangling Invertible Interpretation Network for Explaining Latent Representations

Esser, Patrick; Rombach, Robin; Ommer, Björn

A Disentangling Invertible Interpretation Network for Explaining Latent Representations Conference

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.

Abstract | Links | BibTeX

Network Fusion for Content Creation with Conditional INNs

Rombach, Robin; Esser, Patrick; Ommer, Björn

Network Fusion for Content Creation with Conditional INNs Conference

CVPRW 2020 (AI for Content Creation), 2020.

Abstract | Links | BibTeX

Making Sense of CNNs: Interpreting Deep Representations & Their Invariances with INNs

Rombach, Robin; Esser, Patrick; Ommer, Björn

Making Sense of CNNs: Interpreting Deep Representations & Their Invariances with INNs Conference

IEEE European Conference on Computer Vision (ECCV), 2020.

Abstract | Links | BibTeX

Network-to-Network Translation with Conditional Invertible Neural Networks

Rombach, Robin; Esser, Patrick; Ommer, Björn

Network-to-Network Translation with Conditional Invertible Neural Networks Conference

Neural Information Processing Systems (NeurIPS) (Oral), 2020.

Abstract | Links | BibTeX

2019

Unsupervised Part-Based Disentangling of Object Shape and Appearance

Lorenz, Dominik; Bereska, Leonard; Milbich, Timo; Ommer, Björn

Unsupervised Part-Based Disentangling of Object Shape and Appearance Conference

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (Oral + Best paper finalist: top 45 / 5160 submissions), 2019.

Links | BibTeX

Unsupervised Robust Disentangling of Latent Characteristics for Image Synthesis

Esser, Patrick; Haux, Johannes; Ommer, Björn

Unsupervised Robust Disentangling of Latent Characteristics for Image Synthesis Conference

Proceedings of the Intl. Conf. on Computer Vision (ICCV), 2019.

Abstract | Links | BibTeX

Content and Style Disentanglement for Artistic Style Transfer

Kotovenko, Dmytro; Sanakoyeu, Artsiom; Lang, Sabine; Ommer, Björn

Content and Style Disentanglement for Artistic Style Transfer Conference

Proceedings of the Intl. Conf. on Computer Vision (ICCV), 2019.

Links | BibTeX

Kotovenko, Dmytro; Sanakoyeu, A.; Lang, Sabine; Ma, P.; Ommer, Björn

Using a Transformation Content Block For Image Style Transfer Conference

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.

Links | BibTeX

2018

A Variational U-Net for Conditional Appearance and Shape Generation

Esser, Patrick; Sutter, Ekaterina; Ommer, Björn

A Variational U-Net for Conditional Appearance and Shape Generation Conference

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (short Oral), 2018.

Abstract | Links | BibTeX

Sanakoyeu, A.; Kotovenko, Dmytro; Lang, Sabine; Ommer, Björn

A Style-Aware Content Loss for Real-time HD Style Transfer Conference

Proceedings of the European Conference on Computer Vision (ECCV) (Oral), 2018.

Abstract | Links | BibTeX

Blum, O.; Brattoli, Biagio; Ommer, Björn

X-GAN: Improving Generative Adversarial Networks with ConveX Combinations Conference

German Conference on Pattern Recognition (GCPR) (Oral), Stuttgart, Germany, 2018.

Abstract | Links | BibTeX