Keynote Lectures

Why Neural Rendering Amazing
Matthias Niessner, Technical University of Munich, Germany, Germany

Deepfake Detection: State-of-the-art and Future Directions
Luisa Verdoliva, University of Naples Federico II, Italy, Italy

3D Scene Understanding with Scene Graphs and Self-Supervision
Federico Tombari, Google and Technical University of Munich (TUM), Germany, Germany

How Should We Create Algorithms to Do More with Less?
Sotirios A. Tsaftaris, University of Edinburgh, United Kingdom, United Kingdom

Why Neural Rendering Amazing

Matthias Niessner
Technical University of Munich, Germany

Brief Bio
Dr. Matthias Nießner is a Professor at the Technical University of Munich where he leads the Visual Computing Lab. Before, he was a Visiting Assistant Professor at Stanford University. Prof. Nießner’s research lies at the intersection of computer vision, graphics, and machine learning, where he is particularly interested in cutting-edge techniques for 3D reconstruction, semantic 3D scene understanding, video editing, and AI-driven video synthesis. In total, he has published over 70 academic publications, including 22 papers at the prestigious ACM Transactions on Graphics (SIGGRAPH / SIGGRAPH Asia) journal and 18 works at the leading vision conferences (CVPR, ECCV, ICCV); several of these works won best paper awards, including at SIGCHI’14, HPG’15, SPG’18, and the SIGGRAPH’16 Emerging Technologies Award for the best Live Demo. Prof. Nießner’s work enjoys wide media coverage, with many articles featured in main-stream media including the New York Times, Wall Street Journal, Spiegel, MIT Technological Review, and many more, and his was work led to several TV appearances such as on Jimmy Kimmel Live, where Prof. Nießner demonstrated the popular Face2Face technique; Prof. Nießner’s academic Youtube channel currently has over 5 million views. For his work, Prof. Nießner received several awards: he is a TUM-IAS Rudolph Moessbauer Fellow (2017 – ongoing), he won the Google Faculty Award for Machine Perception (2017), the Nvidia Professor Partnership Award (2018), as well as the prestigious ERC Starting Grant 2018 which comes with 1.500.000 Euro in research funding; in 2019, he received the Eurographics Young Researcher Award honoring the best upcoming graphics researcher in Europe. In addition to his academic impact, Prof. Nießner is a co-founder and director of Synthesia Inc., a brand-new startup backed by Marc Cuban, whose aim is to empower storytellers with cutting-edge AI-driven video synthesis.

Abstract
In this talk, I will present my research vision in how to create a photo-realistic digital replica of the real world, and how to make holograms become a reality. Eventually, I would like to see photos and videos evolve to become interactive, holographic content indistinguishable from the real world. Imagine taking such 3D photos to share with friends, family, or social media; the ability to fully record historical moments for future generations; or to provide content for upcoming augmented and virtual reality applications. AI-based approaches, such as generative neural networks, are becoming more and more popular in this context since they have the potential to transform existing image synthesis pipelines. I will specifically talk about an avenue towards neural rendering where we can retain the full control of a traditional graphics pipeline but at the same time exploit modern capabilities of deep learning, such as handling the imperfections of content from commodity 3D scans. While the capture and photo-realistic synthesis of imagery open up unbelievable possibilities for applications ranging from entertainment to communication industries, there are also important ethical considerations that must be kept in mind. Specifically, in the content of fabricated news (e.g., fake-news), it is critical to highlight and understand digitally-manipulated content. I believe that media forensics plays an important role in this area, both from an academic standpoint to better understand image and video manipulation, but even more importantly from a societal standpoint to create and raise awareness around the possibilities and moreover, to highlight potential avenues and solutions regarding trust of digital content.

Deepfake Detection: State-of-the-art and Future Directions

Luisa Verdoliva
University of Naples Federico II, Italy

Brief Bio
Dr. Luisa Verdoliva is Associate Professor at University Federico II of Naples, Italy, where she leads the Multimedia Forensics Lab. In 2018 she has been visiting professor at Friedrich-Alexander-University (FAU) and in 2019-2020 she has been visiting scientist at Google AI in San Francisco. Her scientific interests are in the field of image and video processing, with main contributions in the area of multimedia forensics. She has published over 120 academic publications, including 45 journal papers. She has been the PI for University Federico II of Naples in the DISPARITY (Digital, Semantic and Physical Analysis of Media Integrity) project funded by DARPA under the MEDIFOR program (2016-2020), and she is the PI for the same University in the DISCOVER (a Data-driven Integrated Approach for Semantic Inconsistencies Verification) project funded by DARPA under the SEMAFOR program (2020-2024). She has actively contributed to the academic community through service as general co-Chair of the 2019 ACM Workshop on Information Hiding and Multimedia Security, technical Chair of the 2019 IEEE Workshop in Information Forensics and Security and area Chair of the IEEE International Conference on Image Processing since 2017. She is on the Editorial Board of IEEE Transactions on Information Forensics and Security and IEEE Signal Processing Letters. Dr. Verdoliva is vice-Chair of the IEEE Information Forensics and Security Technical Committee. She is the recipient of the 2018 Google Faculty Award for Machine Perception and a TUM-IAS Hans Fischer Senior Fellowship (2020-2023). She has been elected to the grade of IEEE Fellow, effective January 1, 2021.

Abstract
In recent years there have been astonishing advances in AI-based synthetic media generation. Thanks to deep learning-based approaches it is now possible to generate data with a high level of realism. While this opens up new opportunities for the entertainment industry, it simultaneously undermines the reliability of multimedia content and supports the spread of false or manipulated information on the Internet. This is especially true for human faces, allowing to easily create new identities or change only some specific attributes of a real face in a video, so-called deepfakes. In this context, it is important to develop automated tools to detect manipulated media in a reliable and timely manner. This talk will describe the most reliable deep learning-based approaches for detecting deepfakes, with a focus on those that enable domain generalization. The results will be presented on challenging datasets with reference to realistic scenarios, such as the dissemination of manipulated images and videos on social networks. Finally, new possible directions will be outlined.

3D Scene Understanding with Scene Graphs and Self-Supervision

Federico Tombari
Google and Technical University of Munich (TUM), Germany

Brief Bio
Federico Tombari is a research scientist and manager at Google and a lecturer (PrivatDozent) at the Technical University of Munich (TUM). He has more than 180 peer-reviewed publications in the field of 3D computer vision and machine learning and their applications to robotics, autonomous driving, healthcare and augmented reality. He got his PhD in 2009 from the University of Bologna, where he was Assistant Professor from 2013 to 2016. In 2008 and 2009 he was an intern and consultant at Willow Garage, California. Since 2014 he leads a team of PhD students at TUM on computer vision and deep learning. In 2018-19 he was co-founder and managing director of Pointu3D Gmbh, a Munich-based startup on 3D perception for AR and robotics. He was the recipient of two Google Faculty Research Awards (in 2015 and 2018) and an Amazon Research Award (in 2017). He has been a research partner of private and academic institutions including Google, Toyota, BMW, Audi, Amazon, Stanford, ETH and JHU. His works have been awarded at conferences and workshops such as 3DIMPVT'11, MICCAI'15, ECCV-R6D'16, AE-CAI'16, ISMAR '17.

Abstract
3D scene understanding investigates the development of computer vision tools for new applications in the field of robotics, autonomous driving, augmented reality, design and architecture (among others). The capability of analysing a scene by extracting its semantic components, such as its parts and objects, together with their pose and attributes, currently heavily relies on deep learning approaches. In this talk, I will illustrate some new directions in 3D scene understanding using deep learning, focusing in particular on indoor scenes. I will first introduce some recently proposed technique in SLAM and 3D reconstruction for indoor scenes, aimed at improving the geometric reconstruction. Then I will present a recent proposal and dataset aimed at leveraging scene graphs and Graph Neural Networks as tools on which to carry out inference for 3D scene understanding tasks, such as instance detection and scene retrieval. Finally, I will focus on estimating the 6D pose of objects in scenes with cluttter and occlusion, and illustrate how self-supervised learning can be leveraged to improve pose estimation accuracy while relaxing the need for annotated data.

How Should We Create Algorithms to Do More with Less?

Sotirios A. Tsaftaris
University of Edinburgh, United Kingdom

Brief Bio
Prof. Sotirios A. Tsaftaris, or Sotos, (http://tsaftaris.com; https://vios.science/; @STsaftaris), obtained his PhD and MSc degrees in Electrical Engineering and Computer Science (EECS) from Northwestern University, USA in 2006 and 2003 respectively. He obtained his Diploma in Electrical and Computer Engineering from the Aristotle University of Thessaloniki, Greece. Currently, he is Canon Medical/Royal Academy of Engineering Research Chair in Healthcare AI, and Chair in Machine Learning and Computer Vision at the University of Edinburgh (UK). He is also a Turing Fellow with the Alan Turing Institute. Previously he was an Assistant Professor with IMT Institute for Advanced Studies, Lucca, Italy and Director of the Pattern Recognition and Image Analysis Unit at IMT (2011-2015). Prior to that, he held a joint Research Assistant Professor appointment at Northwestern University with the Departments of Electrical Engineering and Computer Science (EECS) and Radiology Feinberg School of Medicine (2006-2011). He is an Associate Editor for IEEE Transactions on Medical Imaging and IEEE Journal of Biomedical and Health Informatics. He was tutorial chair for ECCV 2020. He was Doctoral Symposium Chair for IEEE ICIP 2018 (Athens). He served as area chair for CVPR 2021, IEEE ICME 2019, MICCAI 2020/2018, ICCV 2017, MMSP 2016, and VCIP 2015. He has also co-organized workshops for CVPR (2019), ICCV (2017), ECCV (2014), BMVC (2015, 2018), and MICCAI (2016, 2017) and delivered tutorials to ICASSP (2019) and MICCAI (2020). He has received the best paper award (STACOM 2017), twice the Magna Cum Laude Award (ISMRM in 2012 and 2014), and was a finalist for the Early Career Award (Society for Cardiovascular Magnetic Resonance, SCMR, in 2011 and 2019). He has authored more than 140 journal and conference papers particularly in interdisciplinary ?elds and his work is (or has been) supported by the National Institutes of Health (USA), EPSRC & BBSRC (UK), the European Union, and several non-profits and industrial partners. His research interests lie in machine learning, computer vision, image analysis (medical image computing and in particular cardiovascular MR image analysis), and image processing. Dr. Tsaftaris is a Murphy, Onassis, and Marie Curie Fellow. He is also a member of IEEE, ISMRM, SCMR, and IAPR.

Abstract
Healthcare is under a perfect storm with AI/ML being offered as a solution to relieve at least one bottleneck: data analysis. Indeed, the detection of disease, segmentation of anatomy and other classical image analysis tasks, have seen incredible improvements due to deep learning. Yet these advances need lots of data: for every new task, new imaging scan, new hospital, more training data are needed. So how can we train algorithms that can do more with less annotations? Or even more provocatively how can we train algorithms that can do more with less data? I will focus on the need to learn disentangled representations as means to derive solutions that address both questions. I will present a framework of disentangled mixed-dimension tensor embeddings (similar to content-style in computer vision) suitable for several analysis tasks that can do more (tasks) with less (supervision). Within a multi-task learning setting this framework can learn embeddings drawing supervision from self-supervised tasks that use reconstruction and also temporal dynamics, and weakly supervised tasks from non-image level information (e.g. health records). I will show extensions that aim to be multimodal and can learn longitudinal effects without needing longitudinal observations. While the applications I will focus on use healthcare examples, parallels will be drawn to other contexts. I will conclude offering several thoughts on opportunities and challenges in learning disentangled representations and discuss potential fruitful directions.