IMPROVE 2024 Abstracts


Area 1 - Fundamentals

Short Papers
Paper Nr: 10
Title:

Multimodal Deepfake Detection for Short Videos

Authors:

Abderrazzaq Moufidi, David Rousseau and Pejman Rasti

Abstract: The focus of this study is to address the growing challenge posed by AI-generated, persuasive but often misleading multimedia content, which poses difficulties for both human and machine learning interpretation. Building upon our prior research, we analyze the visual and auditory elements of multimedia to identify multimodal deepfakes, with a specific focus on the lower facial area in video clips. This targeted approach sets our research apart in the complex field of deepfake detection. Our technique is particularly effective for short video clips, lasting from 200 milliseconds to one second, surpassing many current deep learning methods that struggle in this duration. In our previous work, we utilized late fusion for correlating audio and lip movements and developed a novel method for video feature extraction that requires less computational power. This is a practical solution for real-world applications with limited computing resources. By adopting a multi-view strategy, the proposed network can leverage various weaknesses found in deepfake generation, from visual anomalies to motion inconsistencies or issues with jaw positioning, which are common in such content.
Download

Paper Nr: 22
Title:

Chaotic Convolutional Long Short-Term Memory Network for Respiratory Motion Prediction

Authors:

Narges Ghasemi, Shahabedin Nabavi, Mohsen E. Moghaddam and Yasser Shekofteh

Abstract: One of the challenges of treating lung tumors in radiation therapy is the patient’s respiratory movements during the treatment, which lead to tumor motion. The goal of respiratory motion prediction is to predict the movements of lung tissues and lung tumors during the breathing cycle. Predicting respiratory movements allows radiation to be directed only at the tumor, minimizing exposure to healthy tissue and reducing the risk of side effects. Using 4D CT images, we can find the next position of the lung tumor and make a 4D radiation therapy plan. As obtaining 4D CT scans is harmful to the patient due to radiation, the aim of this study is to construct a 4D CT during a respiratory cycle using only a 3D image. In this paper, a Chaotic Convolutional Long Short-Term Memory network is proposed, which utilizes chaotic features in respiratory signals to predict pulmonary movements more accurately. The innovation of this method is paying attention to chaotic features of respiratory signals, which leads to better interpretability of the presented model. The obtained results show that the proposed method has a higher learning speed and better performance compared to previous models, which generate 4D CT scans.
Download

Paper Nr: 24
Title:

On the Exploitation of DCT Statistics for Cropping Detectors

Authors:

Claudio V. Ragaglia, Francesco Guarnera and Sebastiano Battiato

Abstract: The study of frequency components derived from Discrete Cosine Transform (DCT) has been widely used in image analysis. In recent years it has been observed that significant information can be extrapolated from them about the lifecycle of the image, but no study has focused on the analysis between them and the source resolution of the image. In this work, we investigated a novel image resolution classifier that employs DCT statistics with the goal to detect the original resolution of images; in particular the insight was exploited to address the challenge of identifying cropped images. Training a Machine Learning (ML) classifier on entire images (not cropped), the generated model can leverage this information to detect cropping. The results demonstrate the classifier’s reliability in distinguishing between cropped and not cropped images, providing a dependable estimation of their original resolution. This advancement has significant implications for image processing applications, including digital security, authenticity verification, and visual quality analysis, by offering a new tool for detecting image manipulations and enhancing qualitative image assessment. This work opens new perspectives in the field, with potential to transform image analysis and usage across multiple domains.
Download

Paper Nr: 21
Title:

3D Virtual Fitting Network (3D VFN)

Authors:

Danyal Mahmood, Wei W. Leong, Humaira Nisar and Ahmad B. Mazlan

Abstract: With the rise in digital technology and the fast pace of life, as well as the change in lifestyle due to the pandemic, people have started adopting online shopping in the garment industry as well. Hence, research on Virtual Try-On (VTO) technologies to be implemented in virtual fitting rooms (VFRs) has drawn significant attention. The existing VFR technologies rely on deep generative models with an end-to-end pipeline, from feature extraction to garment warping and refinement. While currently there are 2D and 3D VTO solutions, the 3D ones have enormous commercial potential in the fashion market as the technology has been proven effective for providing a photo-realistic and detailed try-on result. However, the existing 3D VTO solutions principally rely on annotated human body shapes or avatars, which are unrealistic. By integrating the technologies embedded in both 2D and 3D VTO solutions, this paper proposes a VTO solution that relies on geometric settings in the 3D space namely the 3D Virtual Fitting Network (3D VFN), that solely relies on 2D RGB garment and single-person human images as inputs, generating a photo-realistic warped garment output image by utilizing the geometric settings in the 3D space.
Download

Area 2 - Methods and Techniques

Full Papers
Paper Nr: 19
Title:

Belfort Birth Records Transcription: Preprocessing, and Structured Data Generation

Authors:

Wissam AlKendi, Franck Gechter, Laurent Heyberger and Christophe Guyeux

Abstract: Historical documents are invaluable windows into the past. They play a critical role in shaping our perception of the world and its rich tapestry of stories. This paper presents techniques to facilitate the transcription of the French Belfort Civil Registers of Births, which are valuable historical resources spanning from 1807 to 1919. The methodology focuses on preprocessing steps such as binarization, skew correction, and text line segmentation, tailored to address the challenges posed by these documents including various text styles, marginal annotations, and a hybrid mix of printed and handwritten text. The paper also introduces this archive as a new database by developing a structured strategy for the components of the documents using XML tags, ensuring accurate formatting and alignment of transcriptions with image components at both the paragraph and text line levels for further enhancements to handwritten text recognition models. The results of the preprocessing phase show an accuracy rate of 96%, facilitating the preservation and study of this rich cultural heritage.
Download

Area 3 - Machine Learning

Short Papers
Paper Nr: 16
Title:

Leveraging Temporal Context in Human Pose Estimation: A Survey

Authors:

Dana Skorvankova and Martin Madaras

Abstract: Human pose estimation, the task of localizing skeletal joint positions from visual data, has witnessed significant progress with the advent of machine learning techniques. In this paper, we explore the landscape of deep learning-based methods for human pose estimation and investigate the impact of integrating temporal information into the computational framework. Our comparison covers the evolution from methods based on Convolutional Neural Networks (CNNs) to recurrent architectures and visual transformers. While spatial information alone provides valuable insights, we delve into the benefits of incorporating temporal information, enhancing robustness and adaptability to dynamic human movements. The surveyed methods are adapted to fit the requirements of human pose estimation task, and are evaluated on a real large scale dataset, focusing on a single-person scenario, inferring from 3D point cloud inputs. We present results and insights, showcasing the trade-offs between accuracy, memory requirements, and training time for various approaches. Furthermore, our findings demonstrate that models relying on attention mechanisms can achieve competitive outcomes in the realm of human pose estimation within a limited number of trainable parameters. The survey aims to provide a comprehensive overview of machine learning-based human pose estimation techniques, emphasizing the evolution towards temporally-aware models and identifying challenges and opportunities in this rapidly evolving field.
Download

Area 4 - Applications

Full Papers
Paper Nr: 9
Title:

Toward Objective Variety Testing Score Based on Computer Vision and Unsupervised Machine Learning: Application to Apple Shape

Authors:

Mouad Zine-El-Abidine, Helin Dutagaci, Pejman Rasti, Maria J. Aranzana, Christian Dujak and David Rousseau

Abstract: While precision agriculture or plant phenotyping are very actively moving toward numerical protocols for objective and fast automated measurements, plant variety testing is still very largely guided by manual practices based on visual scoring. Indeed, variety testing is regulated by definite protocols based on visual observation of sketches provided in official catalogs. In this article, we investigated the possibility to shortcut the human visual inspection of these sketches and base the scoring of plant varieties on computer vision similarity of the official sketches with the plants to be inspected. A generic protocol for such a computer vision based approach is proposed and illustrated on apple shape classification. The proposed unsupervised algorithm is demonstrated to be of high value by comparison with classical supervised and self supervised machine and deep learning if some rescaling of the sketches is performed.
Download

Paper Nr: 12
Title:

Using Deep Learning for the Dynamic Evaluation of Road Marking Features from Laser Imaging

Authors:

Maxime Tual, Valérie Muzet, Philippe Foucher, Christophe Heinkelé and Pierre Charbonnier

Abstract: Road markings are essential guidance elements for both drivers and driver assistance systems: their maintenance requires regularly scheduled performance surveys. In this paper, we introduce a deep learning based method to estimate two indicators of the quality of road markings (the percentage of remaining marking and the contrast) directly from their appearance, using reflectance data acquired by a mobile laser imaging system used for inspections. To do this, we enhance the EfficientDet architecture by adding an output sub-network to predict the indicators. It is not possible to physically establish large-scale reference measurements for training and testing our model, but this can be done indirectly by semi-supervised image annotation, a strategy validated by our experiments. Our results show that it is advisable to train the model end-to-end without optimizing its detection performance. They also enlighten the very good accuracy of the indicators predicted by the model.
Download

Paper Nr: 20
Title:

Fitting Tree Model with CNN and Geodesics to Track Blood Vessels in 2D Medical Images and Application to Ultrasound Localization Microscopy Data

Authors:

Théo Bertrand and Laurent D. Cohen

Abstract: Segmentation of tubular structures in vascular imaging is a well studied task, although it is rare that we try to infuse knowledge of the tree-like structure of the regions to be detected. Our work focuses on detecting the important landmarks in the vascular network (via CNN performing both localization and classification of the points of interest) and representing vessels as the edges in some minimal distance tree graph. We leverage geodesic methods relevant to the detection of vessels and their geometry, making use of the space of positions and orientations so that 2D vessels can be accurately represented as trees. We build our model to carry tracking on Ultrasound Localization Microscopy (ULM) data, proposing to build a good cost function for tracking on this type of data. We also test our framework on synthetic and eye fundus data. Results show that the Orientation Score built from ULM data yields good geodesics for tracking blood vessels but scarcity of well annotated ULM data is an obstacle to the localization of vascular landmarks.
Download

Short Papers
Paper Nr: 7
Title:

Production-Ready End-to-End Visual Quality Inspection for Defect Detection on Surfaces Based on a Multi-Stage AI System

Authors:

Patrick Trampert, Tobias Masiak, Felix Schmidt, Nicolas Thewes, Tim Kruse, Christian Witte and Georg Schneider

Abstract: Quality inspection based on optical systems is often limited by the ability of conventional image processing pipelines. Moreover, setting up such a system in production must be tailored towards specific tasks, which is a very tedious, time-consuming, and expensive work that is rarely transferable to different inspection problems. We present a configurable multi-stage system for Visual Quality Inspection (VQI) based on Artificial Intelligence (AI). In addition, we develop a divide-and-conquer strategy to break down complex tasks into sub-problems that are easy-to-handle with well-understood AI approaches. For data acquisition a human-machine-interface is implemented via a graphical user interface running at production side. Besides facilitated AI processing the evolved strategy leads to a knowledge digitalisation through sub-problem annotation that can be transferred to future use cases for defect detection on surfaces. We demonstrate the AI based quality inspection potential in a production use case, where we were able to reduce the false-error-rate from 16.83% to 2.80%, so that our AI workflow has already replaced the old system in a running production.
Download

Paper Nr: 15
Title:

HERO-GPT: Zero-Shot Conversational Assistance in Industrial Domains Exploiting Large Language Models

Authors:

Luca Strano, Claudia Bonanno, Francesco Ragusa, Giovanni M. Farinella and Antonino Furnari

Abstract: We introduce HERO-GPT, a Multi-Modal Virtual Assistant built on a Multi-Agent System designed to swiftly adapt to any procedural context minimizing the need for training on context-specific data. In contrast to traditional approaches to conversational agents, HERO-GPT utilizes a series of dynamically interchangeable documents instead of datasets, hand-written rules, or conversational examples, to provide information on the given scenario. This paper presents the system’s capability to adapt to an industrial domain scenario through the integration of a GPT-based Large Language Model and an object detector to support Visual Question Answering. HERO-GPT is capable of offering conversational guidance on various aspects of industrial contexts, including information on Personal Protective Equipment (PPE), machinery, procedures, and best practices. Experiments performed in an industrial laboratory with real users demonstrate HERO-GPT’s effectiveness. Results indicate that users clearly prefer the proposed virtual assistant over traditional supporting materials such as paper-based manuals in the considered scenario. Moreover, the performance of the proposed system are shown to be comparable or superior to those of traditional approaches, while requiring little domain-specific data for the setup of the system.
Download