IMPROVE 2021 Abstracts

Area 1 - Fundamentals

Short Papers

Paper Nr:	13
Title:	Pothole Detection under Diverse Conditions using Object Detection Models
Authors:	Syed I. Hassan, Dympna O’Sullivan and Susan Mckeever
Abstract:	One of the most important tasks in road maintenance is the detection of potholes. This process is usually done through manual visual inspection, where certified engineers assess recorded images of pavements acquired using cameras or professional road assessment vehicles. Machine learning techniques are now being applied to this problem, with models trained to automatically identify road conditions. However, approaching this real-world problem with machine learning techniques presents the classic problem of how to produce generalisable models. Images and videos may be captured in different illumination conditions, with different camera types, camera angles and resolutions. In this paper we present our approach to building a generalized learning model for pothole detection. We apply four datasets that contain a range of image and environment conditions. Using the Faster RCNN object detection model, we demonstrate the extent to which pothole detection models can generalise across various conditions. Our work is a contribution to bringing automated road maintenance techniques from the research lab into the real-world.
Download

Paper Nr:	25
Title:	Gaussian Blur through Parallel Computing
Authors:	Nahla M. Ibrahim, Ahmed A. ElFarag and Rania Kadry
Abstract:	Two dimensional 2D convolution is one of the most complex calculations and memory intensive algorithms used in image processing. In our paper, we present the 2D convolution algorithm used in the Gaussian blur which is a filter widely used for noise reduction and has high computational requirements. Since, single threaded solutions cannot keep up with the performance and speed needed for image processing techniques. Therefore, parallelizing the image convolution on parallel systems enhances the performance and reduces the processing time. This paper aims to give an overview on the performance enhancement of the parallel systems on image convolution using Gaussian blur algorithm. We compare the speed up of the algorithm on two parallel systems: multi-core central processing unit CPU and graphics processing unit GPU using Google Colaboratory or “colab”.
Download

Paper Nr:	21
Title:	Understanding the Impact of Image Quality in Face Processing Algorithms
Authors:	Patricia P. Reina, Armando G. Menéndez, José G. Menéndez, Graça Bressan and Wilson Ruggeiro
Abstract:	Face processing algorithms are becoming more popular in recent days due to the great domain of application they can be used in. As a consequence, research about the quality of face images is also increasing. Several papers concluded that image quality does impact the performance of face processing algorithms, with low-quality images having a detrimental effect on performance. However, there is still a need for a comprehensive understanding of the extent of the impact of specific distortions like noise, blur, JPEG compression, and brightness. We’ve conducted a study evaluating the performance of three face processing algorithms with images under different levels of the aforementioned distortions. The study’s results placed noise and blur with Gaussian distributions, as the main distortions affecting performance. A detailed description of the adopted methodology, as well as the results obtained from the study, is presented in this paper.
Download

Area 2 - Methods and Techniques

Full Papers

Paper Nr:	18
Title:	Limitations of Local-minima Gaze Prediction
Authors:	Peter C. Varley, Stefania Cristina, Kenneth P. Camilleri and Alexandra Bonnici
Abstract:	We describe a minimal gaze prediction system which is straightforward to implement, can run on everyday hardware, and does not require high-quality video images. We determine head pose and eye gaze from four facial landmarks (nose tip, nose bridge, and eye pupils) which can be expressed as local minima of simple pixel-intensity operations. We assess its stability to: variation of subject’s anatomy; facial landmark outliers; and facial landmark small systematic errors.
Download

Short Papers

Paper Nr:	16
Title:	Integration of Image Processing Techniques into the Unity Game Engine
Authors:	Leon Masopust, Sebastian Pasewaldt, Jürgen Döllner and Matthias Trapp
Abstract:	This paper describes an approach for using the Unity game engine for image processing by integrating a custom GPU-based image processor. For, it describes different application levels and integration approaches for extending the Unity game engine. It further documents the respective software components and implementation details required, and demonstrates use cases such as scene post-processing and material-map processing.
Download

Area 3 - Imaging

Full Papers

Paper Nr:	28
Title:	Biometric Authentication System based on Hand Geometry and Palmprint Features
Authors:	Laura G. Oldal and András Kovács
Abstract:	In today’s society, biometric authentication has gained more significance, since it uses physical characteristics of a person for identification. Physical features provide greater security compared to ownership or knowledge based factors. More and more physiological measures prove to be great characteristics for personal authentication. A multimodal biometric authentication system has the advantage of using multiple physical characteristics for authentication achieving greater accuracy. If one modality fails to identify a person with high accuracy, other modalities are employed. However, in these kind of systems, every modality has a different imagery data requirement, which provides multiple captured images for evaluation. The method described in the article uses the same input data for processing multiple physiological features at once. Biometric characteristics used by the system are hand geometry and palmprint features. The imagery data requirement is a high-resolution image of a well-lit hand with dark background. Capturing the image in good sanitary conditions has become an important requirement in the past few years. Advantage of a high-resolution image compared to images captured with dedicated hardware devices like fingerprint or palmprint scanners, is contactless capturing of the image. Another benefit of a high-resolution camera usage is lower cost claims compared to the other systems using dedicated hardware for image capturing.
Download

Area 4 - Machine Learning

Full Papers

Paper Nr:	5
Title:	BarChartAnalyzer: Digitizing Images of Bar Charts
Authors:	Komal Dadhich, Siri C. Daggubati and Jaya Sreevalsan-Nair
Abstract:	Charts or scientific plots are widely used visualizations for efficient knowledge dissemination from datasets. However, these charts are predominantly available in image format. There are various scenarios where these images are interpreted in the absence of datasets used initially to generate the charts. This leads to a pertinent need for data extraction from an available chart image. We narrow down our scope to bar charts and propose a semi-automated workflow, BarChartAnalyzer, for data extraction from chart images. Our workflow integrates the following tasks in sequence: chart type classification, image annotation, object detection, text detection and recognition, data table extraction, text summarization, and optionally, chart redesign. Our data extraction uses second-order tensor fields from tensor voting used in computer vision. Our results show that our workflow can effectively and accurately extract data from images of different resolutions and of different subtypes of bar charts. We also discuss specific test cases where BarChartAnalyzer fails. We conclude that our work is an effective and special image processing application for interpreting charts.
Download

Paper Nr:	29
Title:	Benefits of Layered Software Architecture in Machine Learning Applications
Authors:	Ármin Romhányi and Zoltán Vámossy
Abstract:	The benefits of layering in software applications are well-known not only to authors and industry experts, but to software enthusiasts as well because the layering provides a testable and more error-proof framing for applications. Despite the benefits, however, the increasingly popular area of machine learning is yet to embrace the advantages of such a design. In the present paper, we aim to investigate if characteristic benefits of layered architecture can be applied to machine learning by designing and building a system that uses a layered machine learning approach. Then, the implemented system is compared to other already existing implementations in the literature targeting the field of facial recognition. Although we chose this field as our example for its literature being rich in both theoretical foundations and practical implementations, the principles and practices outlined by the present work are also applicable in a more general sense.
Download

Short Papers

Paper Nr:	2
Title:	Deep Visio-PhotoPlethysmoGraphy Reconstruction Pipeline for Non-invasive Cuff-less Blood Pressure Estimation
Authors:	Francesca Trenta, Francesco Rundo, Roberto Leotta and Sebastiano Battiato
Abstract:	In medical field, many cardiovascular and correlated diseases can be early treated by monitoring and analyzing the subject’s blood pressure (BP). However, the measurement of blood pressure requires the use of invasive medical and health equipment, including the classical sphygmomanometer or the digital pressure meter. In this paper, we proposed an innovative algorithmic pipeline to properly estimate the systolic and diastolic blood pressure of a subject through the visio-reconstruction of the PhotoPlethysmoGraphic (PPG) signal. By means of an innovative method of face-motion magnification through Deep Learning, it is possible to visio-reconstruct specific points of the PPG signal in order to extract features related to the pressure level of the analyzed subject. The proposed approach can be used effectively in healthcare facilities for the fast and noninvasive monitoring of the pressure level of subjects or in other similar applications. We compared our results using a classic cuff-less blood pressure device with encouraging results that reach 92% in accuracy.
Download

Paper Nr:	3
Title:	Advanced Car Driving Assistant System: A Deep Non-local Pipeline Combined with 1D Dilated CNN for Safety Driving
Authors:	Francesco Rundo, Roberto Leotta, Francesca Trenta, Giovanni Bellitto, Federica P. Salanitri, Vincenzo Piuri, Angelo Genovese, Ruggero D. Labati, Fabio Scotti, Concetto Spampinato and Sebastiano Battiato
Abstract:	Visual saliency refers to the part of the visual scene in which the subject’s gaze is focused, allowing significant applications in various fields including automotive. Indeed, the car driver decides to focus on specific objects rather than others by deterministic brain-driven saliency mechanisms inherent perceptual activity. In the automotive industry, vision saliency estimation is one of the most common technologies in Advanced Driver Assistant Systems (ADAS). In this work, we proposed an intelligent system consisting of: (1) an ad-hoc Non-Local Semantic Segmentation Deep Network to process the frames captured by automotive-grade camera device placed outside the car, (2) an innovative bio-sensor to perform car driver PhotoPlethysmoGraphy (PPG) signal sampling for monitoring related drowsiness and, (3) ad-hoc designed 1D Temporal Deep Convolutional Network designed to classify the so collected PPG time-series providing an assessment of the driver attention level. A downstream check-block verifies if the car driver attention level is adequate for the saliency-based scene classification. Our approach is extensively evaluated on DH1FK dataset, and experimental results show the effectiveness of the proposed pipeline.
Download

Paper Nr:	24
Title:	Generating Images from Caption and Vice Versa via CLIP-Guided Generative Latent Space Search
Authors:	Federico A. Galatolo, Mario A. Cimino and Gigliola Vaglini
Abstract:	In this research work we present CLIP-GLaSS, a novel zero-shot framework to generate an image (or a caption) corresponding to a given caption (or image). CLIP-GLaSS is based on the CLIP neural network, which, given an image and a descriptive caption, provides similar embeddings. Differently, CLIP-GLaSS takes a caption (or an image) as an input, and generates the image (or the caption) whose CLIP embedding is the most similar to the input one. This optimal image (or caption) is produced via a generative network, after an exploration by a genetic algorithm. Promising results are shown, based on the experimentation of the image Generators BigGAN and StyleGAN2, and of the text Generator GPT2.
Download

Paper Nr:	10
Title:	SVW-UCF Dataset for Video Domain Adaptation
Authors:	Artjoms Gorpincenko and Michal Mackiewicz
Abstract:	Unsupervised video domain adaptation (DA) has recently seen a lot of success, achieving almost if not perfect results on the majority of various benchmark datasets. Therefore, the next natural step for the field is to come up with new, more challenging problems that call for creative solutions. By combining two well known sets of data - SVW and UCF, we propose a large-scale video domain adaptation dataset that is not only larger in terms of samples and average video length, but also presents additional obstacles, such as orientation and intra-class variations, differences in resolution, and greater domain discrepancy, both in terms of content and capturing conditions. We perform an accuracy gap comparison which shows that both SVW→UCF and UCF→SVW are empirically more difficult to solve than existing adaptation paths. Finally, we evaluate two state of the art video DA algorithms on the dataset to present the benchmark results and provide a discussion on the properties which create the most confusion for modern video domain adaptation methods
Download

Paper Nr:	11
Title:	Image-based Plant Disease Diagnosis with Unsupervised Anomaly Detection based on Reconstructability of Colors
Authors:	Ryoya Katafuchi and Terumasa Tokunaga
Abstract:	This paper proposes an unsupervised anomaly detection technique for image-based plant disease diagnosis. The construction of large and publicly available datasets containing labeled images of healthy and diseased crop plants led to growing interest in computer vision techniques for automatic plant disease diagnosis. Although supervised image classifiers based on deep learning can be a powerful tool for plant disease diagnosis, they require a huge amount of labeled data. The data mining technique of anomaly detection includes unsupervised approaches that do not require rare samples for training classifiers. We propose an unsupervised anomaly detection technique for image-based plant disease diagnosis that is based on the reconstructability of colors; a deep encoder-decoder network trained to reconstruct the colors of healthy plant images should fail to reconstruct colors of symptomatic regions. Our proposed method includes a new image-based framework for plant disease detection that utilizes a conditional adversarial network called pix2pix and a new anomaly score based on CIEDE2000 color difference. Experiments with PlantVillage dataset demonstrated the superiority of our proposed method compared to an existing anomaly detector at identifying diseased crop images in terms of accuracy, interpretability and computational efficiency.
Download

Area 5 - Multimedia Communications

Short Papers

Paper Nr:	27
Title:	Gesture Recognition for UAV-based Rescue Operation based on Deep Learning
Authors:	Chang Liu and Tamás Szirányi
Abstract:	UAVs play an important role in different application fields, especially in rescue. To achieve good communication between the onboard UAV and humans, an approach to accurately recognize various body gestures in the wild environment by using deep learning algorithms is presented in this work. The system can not only recognize human rescue gestures but also detect people, track people, and count the number of humans. A dataset of ten basic rescue gestures (i.e. Kick, Punch, Squat, Stand, Attention, Cancel, Walk, Sit, Direction, and PhoneCall) has been created by a UAV’s camera. From the perspective of UAV rescue, the feedback from the user is very important. The two most important dynamic rescue gestures are the novel dynamic Attention and Cancel which represent the set and reset functions respectively. The system shows a warning help message when the user is waving to the UAV. The user can also cancel the communication at any time by showing the drone the body rescue gesture that indicates the cancellation according to their needs. This work has laid the groundwork for the next rescue routes that the UAV will design based on user feedback. The system achieves 99.47% accuracy on training data and 99.09% accuracy on testing data by using the deep learning method.
Download

Paper Nr:	12
Title:	Defamation 2.0: New Threats in Digital Media Era - An Overview on Forensics Approaches in the Social Network Ecosystem
Authors:	Cristina Nastasi and Sebastiano Battiato
Abstract:	Recently, social networks have become the largest and fastest growing websites on the Internet. These platforms contain sensitive and personal data of hundreds of millions of people, and are integrated also into millions of other websites so it is more and more important to focus on security and privacy issues. In this work, we expose the defamation issue in the social network context and apply some known methods to recover data of person who offends reputation of others over 250 different social media frameworks. The datasets, that it is possible to exploit, can contain various profile information (user data, photos, etc.) and associated meta-data (internal timestamps and unique identifiers). These data are significant in the field of digital forensics to be properly used as evidences in front of Court.
Download

Area 6 - Applications

Full Papers

Paper Nr:	6
Title:	Contactless Optical Respiration Rate Measurement for a Fast Triage of SARS-CoV-2 Patients in Hospitals
Authors:	Carolin Wuerich, Felix Wichum, Christian Wiede and Anton Grabmaier
Abstract:	Especially in hospital entrances, it is important to spatially separate potentially SARS-CoV-2 infected patients from other people to avoid further spreading of the disease. Whereas the evaluation of conventional laboratory tests takes too long, the main symptoms, fever and shortness of breath, can indicate the presence of a SARS-CoV-2 infection and can thus be considered for triage. Fever can be measured contactlessly using an infrared sensor, but there are currently no systems for measuring the respiration rate in a similarly fast and contactless way. Therefore, we propose an RGB-camera-based method to remotely determine the respiration rate for the triage in hospitals. We detect and track image features on the thorax, band-pass filter the trajectories and further reduce noise and artefacts by applying a principal component analysis. Finally, the respiration rate is computed using Welch’s power spectral density estimate. Our contactless approach is focused on a fast measurement and computation. It is especially adapted to the use case of the triage in hospitals by comprising a face detection which is robust against partial occlusion allowing the patients to wear face masks. Moreover, we show that our method is able to correctly determine the respiration frequency for standing patients despite considerable body sway.
Download

Paper Nr:	8
Title:	Adversarial Unsupervised Domain Adaptation Guided with Deep Clustering for Face Presentation Attack Detection
Authors:	Yomna S. El-Din, Mohamed N. Moustafa and Hani Mahdi
Abstract:	Face Presentation Attack Detection (PAD) has drawn increasing attentions to secure the face recognition systems that are widely used in many applications. Conventional face anti-spoofing methods have been proposed, assuming that testing is from the same domain used for training, and so cannot generalize well on unseen attack scenarios. The trained models tend to overfit to the acquisition sensors and attack types available in the training data. In light of this, we propose an end-to-end learning framework based on Domain Adaptation (DA) to improve PAD generalization capability. Labeled source-domain samples are used to train the feature extractor and classifier via cross-entropy loss, while unsupervised data from the target domain are utilized in adversarial DA approach causing the model to learn domain-invariant features. Using DA alone in face PAD fails to adapt well to target domain that is acquired in different conditions with different devices and attack types than the source domain. And so, in order to keep the intrinsic properties of the target domain, deep clustering of target samples is performed. Training and deep clustering are performed end-to-end, and experiments performed on several public benchmark datasets validate that our proposed Deep Clustering guided Unsupervised Domain Adaptation (DCDA) can learn more generalized information compared with the state-of-the-art classification error on the target domain.
Download

Short Papers

Paper Nr:	7
Title:	Fast Human Activity Recognition
Authors:	Shane Reid, Sonya Coleman, Dermot Kerr, Philip Vance and Siobhan O’Neill
Abstract:	Human activity recognition has been an open problem in computer vision for almost two decades. In that time there have been many approaches proposed to solve this problem, but very few have managed to solve it in a way that is sufficiently computationally efficient for real time applications. Recently this has changed, with keypoint based methods demonstrating a high degree of accuracy with low computational cost. These approaches take a given image and return a set of joint locations for each individual within an image. In order to achieve real time performance, a sparse representation of these features over a given time frame is required for classification. Previous methods have achieved this by using a reduced number of keypoints, but this approach gives a less robust representation of the individual’s body pose and may limit the types of activity that can be detected. We present a novel method for reducing the size of the feature set, by calculating the Euclidian distance and the direction of keypoint changes across a number of frames. This allows for a meaningful representation of the individuals movements over time. We show that this method achieves accuracy on par with current state of the art methods, while demonstrating real time performance.
Download

Paper Nr:	9
Title:	A Novel Fourier-based Approach for Camera Identification
Authors:	Vittoria Bruni, Silvia Marconi and Domenico Vitulano
Abstract:	In this paper the source camera identification problem is considered and novel features for PRNU noise are studied. The regularity of a suitable sampling of the PRNU image is considered and it is measured through the decay of its Fourier spectrum. This single global feature is independent of image filtering and size, but it carries significant information concerning the image. The aim is to use this feature in the classification step. Some preliminary results show that this kind of approach is promising as it is able to reach identification scores that can be comparable to the reference source camera identification method, without requiring any image downsampling or cropping.
Download

Paper Nr:	22
Title:	Image Copy-Move Forgery Detection using Color Features and Hierarchical Feature Point Matching
Authors:	Yi-Lin Tsai and Jin-Jang Leou
Abstract:	In this study, an image copy-move forgery detection approach using color features and hierarchical feature point matching is proposed. The proposed approach contains three main stages, namely, pre-processing and feature extraction, hierarchical feature point matching, and iterative forgery localization and post-processing. In the proposed approach, Gaussian-blurred images and difference of Gaussians (DoG) images are constructed. Hierarchical feature point matching is employed to find matched feature point pairs, in which two matching strategies, namely, group matching via scale clustering and group matching via overlapped gray level clustering, are used. Based on the experimental results obtained in this study, the performance of the proposed approach is better than those of three comparison approaches.
Download

Paper Nr:	23
Title:	Food Recognition for Dietary Monitoring during Smoke Quitting
Authors:	Sebastiano Battiato, Pasquale Caponnetto, Oliver Giudice, Mazhar Hussain, Roberto Leotta, Alessandro Ortis and Riccardo Polosa
Abstract:	This paper presents the current state of an ongoing project which aims to study, develop and evaluate an automatic framework able to track and monitor the dietary habits of people involved in a smoke quitting protocol. The system will periodically acquire images of the food consumed by the users, which will be analysed by modern food recognition algorithms able to extract and infer semantic information from food images. The extracted information, together with other contextual data, will be exploited to perform advanced inferences and to make correlations between eating habits and smoke quitting process steps, providing specific information to the clinicians about the response to the quitting protocol that are directly related to observable changes in eating habits.
Download