IMPROVE 2023 Abstracts


Area 1 - Fundamentals

Full Papers
Paper Nr: 16
Title:

Multi-Scale Surface Normal Estimation from Depth Maps

Authors:

Diclehan Ulucan, Oguzhan Ulucan and Marc Ebner

Abstract: Surface normal vectors are important local descriptors of images, which are utilized in many applications in the field of computer vision and computer graphics. Hence, estimating the surface normals from structured range sensor data is an important step for many image processing pipelines. Thereupon, we present a simple yet effective, learning-free surface normal estimation strategy for both complete and incomplete depth maps. The proposed method takes advantage of scale-space. While the finest scale is used for the initial estimations, the missing surface normals, which cannot be estimated properly are filled from the coarser scales of the pyramid. The same procedure is applied for incomplete depth maps with a slight modification, where we guide the algorithm using the gradient information obtained from the shading image of the scene, which has a geometric relationship with the surface normals. In order to test our method for the incomplete depth maps scenario, we augmented the MIT-Berkeley Intrinsic Images dataset by creating two different sets, namely, easy and hard. According to the experiments, the proposed algorithm achieves competitive results on datasets containing both single objects and realistic scenes.
Download

Paper Nr: 20
Title:

Intrinsic Image Decomposition: Challenges and New Perspectives

Authors:

Diclehan Ulucan, Oguzhan Ulucan and Marc Ebner

Abstract: In the field of intrinsic image decomposition, alongside developing a robust algorithm for the ill-posed problem, it is also required to benchmark the method on a comprehensive dataset by using a suitable evaluation metric. However, there are certain limitations in existing evaluation metrics. In this study, two new evaluation strategies are proposed to analyze intrinsics according to their characteristics. The ensemble of metrics combines different perceptual quality metrics in scale-space, while the imperceptible Δ𝐸 score is the modified version of the classical Δ𝐸 metric. Intrinsic image decomposition studies that extract the reflectance and shading images are benchmarked on two datasets. Furthermore, an overview of the field of intrinsic image decomposition is provided and the challenges that have to be overcome are pointed out.
Download

Short Papers
Paper Nr: 10
Title:

Emotion Based Music Visualization with Fractal Arts

Authors:

Januka Dharmapriya, Lahiru Dayarathne, Tikiri Diasena, Shiromi Arunathilake, Nihal Kodikara and Primal Wijesekera

Abstract: Emotion based music visualization is an emerging multidisciplinary research concept. Fractal arts are generated by executing mathematical instructions through computer programs. Therefore in this research, several multidisciplinary concepts, various subject areas are considered and combined to generate artistic but computationally created visualizations. The main purpose of this research is to obtain the most suitable emotional fractal art visualization for a given song segment and evaluate the entertainment value generated through the above approach. Due to the novice nature of previous findings, limited availability of emotionally annotated musical databases and fractal art music visualizing tools, obtaining accurate emotional visualization using fractal arts is a computationally challenging task. In this study, Russell’s Circumplex Emotional Model was used to obtain emotional categories. Emotions were predicted using machine learning models trained with MediaEval Database for Emotional Analysis of Music. A regression approach was used with the WEKA machine learning tool for emotion prediction. Effectiveness of the results compared with several regression models available in WEKA. According to the above comparison, the random forest regression approach provided the most reliable results compared to other models (accuracy of 81% for arousal and 61% for valence). Relevant colour for the emotion was obtained using Itten’s circular colour model and it was mapped with a fractal art generated using the JWildfire Fractal Art Generating tool. Then fractal art was animated according to the song variations. After adding enhanced features to the above approach, the evaluation was conducted considering 151 participants. Final Evaluation unveiled that Emotion Based Music Visualizations with Fractal Arts can be used to visualize songs considering emotions and most of the visualizations can exceed the entertainment value generated by currently available music visualization patterns.
Download

Paper Nr: 23
Title:

From Depth Sensing to Deep Depth Estimation for 3D Reconstruction: Open Challenges

Authors:

Charles Hamesse, Hiep Luong and Rob Haelterman

Abstract: For a few years, techniques based on deep learning for dense depth estimation from monocular RGB frames have increasingly emerged as potential alternatives to 3D sensors such as depth cameras to perform 3D reconstruction. Recent works mention more and more interesting capabilities: estimation of high resolution depth maps, handling of occlusions, or fast execution on various hardware platforms, to name a few. However, it remains unclear whether these methods could actually replace depth cameras, and if so, in which scenario it is really beneficial to do so. In this paper, we show that the errors made by deep learning methods for dense depth estimation have a specific nature, very different from that of depth maps acquired from depth cameras (be it with stereo vision, time-of-flight or other technologies). We take a voluntarily high vantage point and analyze the state-of-the-art dense depth estimation techniques and depth sensors in a hand-picked test scene, in the aim of better understanding the current strengths and weaknesses of different methods and providing guidelines for the design of robust systems which rely on dense depth perception for 3D reconstruction.
Download

Paper Nr: 31
Title:

A Global Multi-Temporal Dataset with STGAN Baseline for Cloud and Cloud Shadow Removal

Authors:

Morui Zhu, Chang Liu and Tamás Szirányi

Abstract: Due to the inevitable contamination of thick clouds and their shadows, satellite images are greatly affected, which significantly reduces the usability of data from satellite images. Therefore, obtaining high-quality image data without cloud contamination in a specific area and at the time we need it is an important issue. To address this problem, we collected a new multi-temporal dataset covering the entire globe, which is used to remove clouds and their shadows. Since generative adversarial networks (GANs) perform well in conditional image synthesis challenges, we utilized a spatial-temporal GAN (STGAN) to eliminate clouds and their shadows in optical satellite images. As a baseline model, STGAN demonstrated outstanding performance in peak signalto-noise ratio (PSNR) and structural similarity index (SSIM), achieving scores of 33.4 and 0.929, respectively. The cloud-free images generated in this work have significant utility for various downstream applications in real-world environments. Dataset is publicly available: https://github.com/zhumorui/SMT-CR
Download

Paper Nr: 14
Title:

Normalised Color Distances

Authors:

André S. Marçal

Abstract: This paper presents normalised color distances based on widely used metrics in the RGB and L*a*b* color models, and an adjusted City Block distance for the HSV model. Three experiments were carried out, focusing on color perception, the identification of the actual range of the various normalised color distances, and their ability of compare and match images which have a predominant color perceived by a human observer. For this task, a spatially tolerant color distance is proposed. The image comparison experiment uses subsets of 6 images, out of 15 square tile images, with a total of 270 test cases considered. A modified Dunn index is proposed for the evaluation. L*a*b* based distances were found to be better adjusted to the human color perception. Color distances based on L*a*b* model were also more effective for image comparison, with spatially tolerant color distances having slightly better performance than using a direct image pixel pairing.
Download

Area 2 - Methods and Techniques

Short Papers
Paper Nr: 6
Title:

Application of Particle Detection Methods to Solve Particle Overlapping Problems

Authors:

Wissam AlKendi, Paresh Mahapatra, Bassam Alkindy, Christophe Guyeux and Magali Barthès

Abstract: The study of fluid flows concerns many fields (e.g., biology, aeronautics, chemistry). To overcome the problems of flow disturbances caused by intrusive physical sensors, different methods of flow quantification, based on optical visualization, are particularly interesting. Among them, PTV (Particle Tracking Velocimetry) which allows the individualized tracking of tracers/particles, is of growing interest. Different numerical treatments will enable us to identify and track the particles. However, detection algorithms (e.g., Sobel, Canny, Robert, Gaussian, morphology) can be sensitive to noise and the phenomenon of overlapping particles in flow. In this work, we have focused on the detection part with the objective of improving it as much as possible. To quantify the performance of the different methods tested, synthetic images, with well-defined parameters have been generated. We compared the performances of the Laplacian of Gaussian (LoG) and the Difference of Gaussian (DoG) methods, with the traditional method of threshold binarization. In addition, we tested other techniques based on non-local means (NLM) and overlapping detector to improve the detection of particles in case of noisy images or overlapping particles. The results show that the LoG gives very good results in most cases, with additional improvement when using the NLM and the overlap detector.
Download

Paper Nr: 7
Title:

An Anisotropic and Asymmetric Causal Filtering Based Corner Detection Method

Authors:

Ghulam S. Shokouh, Philippe Montesinos and Baptiste Magnier

Abstract: An asymmetric-anisotropic causal diffusion filtering-based curvature operator is proposed in this communication. The new corner operator produces optimal results on small structures, such as, corners at pixel level and also sub-pixel level precision. Meanwhile, this method is robust against noises due to its asymmetric diffusion scheme. Experiments have been performed on a set of both synthetic and real images. The obtained results are promising and better without any ambiguity as compared with the two referenced corner operators, namely Kitchen Rosenfeld and Harris corner detector.
Download

Area 3 - Imaging

Short Papers
Paper Nr: 15
Title:

Fuzzy Inference System in a Local Eigenvector Based Color Image Smoothing Framework

Authors:

Khleef Almutairi, Samuel Morillas and Pedro Latorre-Carmona

Abstract: Noise filtering in colour images is a complex task since it is essential to distinguish between structural and noise information in the image. It would therefore be important to simultaneously remove noise while keeping the original image details. This paper proposes a method based on a fuzzy inference system to eliminate noise and retrieve original image details. Images are transformed from an RGB space to an eigenvector based space and this transformation is fed to the fuzzy system. Results confirm the validity of the approach, its superior performance when compared to the eigenvector based framework it is based on, and its competitive behaviour when compared to other state-of-the-art methods.
Download

Area 4 - Machine Learning

Full Papers
Paper Nr: 3
Title:

3D Semantic Scene Reconstruction from a Single Viewport

Authors:

Maximilian Denninger and Rudolph Triebel

Abstract: We introduce a novel method for semantic volumetric reconstructions from a single RGB image. To overcome the problem of semantically reconstructing regions in 3D that are occluded in the 2D image, we propose to combine both in an implicit encoding. By relying on a headless autoencoder, we are able to encode semantic categories and implicit TSDF values into a compressed latent representation. A second network then uses these as a reconstruction target and learns to convert color images into these latent representations, which get decoded after inference. Additionally, we introduce a novel loss-shaping technique for this implicit representation. In our experiments on the realistic benchmark Replica dataset, we achieve a full reconstruction of a scene, which is visually and in terms of quantitative measures better than current methods while only using synthetic data during training. On top of that, we evaluate our approach on color images recorded in the wild.
Download

Paper Nr: 5
Title:

TabProIS: A Transfer Learning-Based Model for Detecting Tables in Product Information Sheets

Authors:

Michael Sildatke, Jan Delember, Bodo Kraft and Albert Zündorf

Abstract: Product Information Sheets (PIS) are human-readable documents containing relevant product specifications. In these documents, tables often present the most important information. Hence, table detection is a crucial task for automating the process of Information Extraction (IE) from PIS. Modern table detection algorithms are Machine Learning (ML)-based and popular object detection networks like Faster R-CNN or Cascade Mask R-CNN form their foundation. State-of-the-art models like TableBank or CDeCNet are trained on publicly available table detection datasets. However, the documents in these datasets do not cover particular characteristics of PIS, e.g., background design elements like provider logos or watermarks. Consequently, these state-of-the-art models do not perform well enough on PIS. Transfer Learning (TL) and Ensembling describe two methods of reusing existing models to improve their performance on a specific problem. We use these techniques to build an optimized model for detecting tables in PIS, named TabProIS. This paper presents three main contributions: First, we provide a new table detection dataset containing 5,600 document images generated from PIS of the German energy industry. Second, we offer three TL-based models with different underlying network architectures, namely TableBank, CDeC-Net, and You Only Look Once (YOLO). Third, we present a pipeline to automatically optimize available models based on different Ensembling and post-processing strategies. A selection of our models and the dataset will be publicly released to enable the reproducibility of the results.
Download

Short Papers
Paper Nr: 8
Title:

Layer-wise External Attention for Efficient Deep Anomaly Detection

Authors:

Tokihisa Hayakawa, Keiichi Nakanishi, Ryoya Katafuchi and Terumasa Tokunaga

Abstract: Recently, the visual attention mechanism has become a promising way to improve the performance of Convolutional Neural Networks (CNNs) for many applications. In this paper, we propose a Layer-wise External Attention mechanism for efficient image anomaly detection. The core idea is the integration of unsupervised and supervised anomaly detectors via the visual attention mechanism. Our strategy is as follows: (i) prior knowledge about anomalies is represented as an anomaly map generated by the pre-trained network; (ii) the anomaly map is translated to an attention map via an external network. (iii) the attention map is then incorporated into intermediate layers of the anomaly detection network via visual attention. Notably, the proposed method can be applied to any CNN model in an end-to-end training manner. We also propose an example of a network with Layer-wise External Attention called Layer-wise External Attention Network (LEA-Net). Through extensive experiments using real-world datasets, we demonstrate that Layer-wise External Attention consistently boosts the anomaly detection performances of an existing CNN model, even on small and unbalanced data. Moreover, we show that Layer-wise External Attention works well with Self-Attention Networks.
Download

Paper Nr: 25
Title:

Deep Learning and Medical Image Analysis: Epistemology and Ethical Issues

Authors:

Francesca Lizzi, Alessandra Retico and Maria E. Fantacci

Abstract: Machine and deep learning methods applied to medicine seem to be a promising way to improve the performance in solving many issues from the diagnosis of a disease to the prediction of personalized therapies by analyzing many and diverse types of data. However, developing an algorithm with the aim of applying it in clinical practice is a complex task which should take into account the context in which the software is developed and should be used. In the first report of the World Health Organization (WHO) about the ethics and governance of Artificial Intelligence (AI) for health published in 2021, it has been stated that AI may improve healthcare and medicine all over the world only if ethics and human rights are a main part of its development. Involving ethics in technology development means to take into account several issues that should be discussed also inside the scientific community: the epistemological changes, population stratification issues, the opacity of deep learning algorithms, data complexity and accessibility, health processes and so on. In this work, some of the mentioned issues will be discussed in order to open a discussion on whether and how it is possible to address them.
Download

Paper Nr: 11
Title:

Handling Data Heterogeneity in Federated Learning with Global Data Distribution

Authors:

C. Nagaraju, Mrinmay Sen and C. K. Mohan

Abstract: Federated learning, a different direction of distributed optimization, is very much important when there are re- strictions of data sharing due to privacy and communication overhead. In federated learning, instead of sharing raw data, information from different sources are gathered in terms of model parameters or gradients of local loss functions and these information is fused in such way that we can find the optima of average of all the local loss functions (global objective). Exiting analyses on federated learning show that federated optimization gets slow convergence when data distribution across all the clients or sources are not homogeneous. Heterogeneous data distribution in federated learning causes objective inconsistency which means global model converges to a another stationary point which is not same as the optima of the global objective which results in poor per- formance of the global model. In this paper, we propose a federated Learning(FL) algorithm in heterogeneous data distribution. To handle data heterogeneity during collaborative training, we generate data in local clients with the help of a globally trained Gaussian Mixture Models(GMM). We update each local model with the help of both original and generated local data and then perform the similar operations of the most popular algorithm called FedAvg. We compare our proposed method with exiting FedAvg and FedProx algorithms with CIFAR10 and FashionMNIST Non-IID data. Our experimental results show that our proposed method performs better than the exiting FedAvg and FedProx algorithm in terms of training loss, test loss and test accuracy in heterogeneous system.
Download

Paper Nr: 19
Title:

FUB-Clustering: Fully Unsupervised Batch Clustering

Authors:

Salvatore Giurato, Alessandro Ortis and Sebastiano Battiato

Abstract: Traditional methods for unsupervised image clustering such as K-means, Gaussian Mixture Models (GMM), and Spectral Clustering (SC) have been proposed. However, these strategies may be time-consuming and labor-intensive, particularly when dealing with a vast quantity of unlabeled images. Recent studies have proposed incorporating deep learning techniques to improve upon these classic models. In this paper, we propose an approach that addresses the limitations of these prior methods by allowing for the association of multiple images at a time to each group and by considering images that are extremely close to the images that are already associated to the correct cluster. Additionally, we propose a method for reducing and unifying clusters when the number of clusters is deemed too high by the user, utilizing four different heuristics while considering the clustering as a single element. Our proposed method is able to analyze and group images in real-time without any prior training. Experiments confirm the effectiveness of the proposed strategy in various setting and scenarios.
Download

Paper Nr: 27
Title:

Automatic Defect Detection in Sewer Network Using Deep Learning Based Object Detector

Authors:

Bach Ha, Birgit Schalter, Laura White and Joachim Köhler

Abstract: Maintaining sewer systems in large cities is important, but also time and effort consuming, because visual inspections are currently done manually. To reduce the amount of aforementioned manual work, defects within sewer pipes should be located and classified automatically. In the past, multiple works have attempted solving this problem using classical image processing, machine learning, or a combination of those. However, each provided solution only focus on detecting a limited set of defect/structure types, such as fissure, root, and/or connection. Furthermore, due to the use of hand-crafted features and small training datasets, generalization is also problematic. In order to overcome these deficits, a sizable dataset with 14.7 km of various sewer pipes were annotated by sewer maintenance experts in the scope of this work. On top of that, an object detector (EfficientDet-D0) was trained for automatic defect detection. From the result of several expermients, peculiar natures of defects in the context of object detection, which greatly effect annotation and training process, are found and discussed. At the end, the final detector was able to detect 83% of defects in the test set; out of the missing 17%, only 0.77% are very severe defects. This work provides an example of applying deep learning- based object detection into an important but quiet engineering field. It also gives some practical pointers on how to annotate peculiar ”object”, such as defects.
Download

Area 5 - Applications

Full Papers
Paper Nr: 13
Title:

Unsupervised Domain Adaptation for Video Violence Detection in the Wild

Authors:

Luca Ciampi, Carlos Santiago, Joao P. Costeira, Fabrizio Falchi, Claudio Gennaro and Giuseppe Amato

Abstract: Video violence detection is a subset of human action recognition aiming to detect violent behaviors in trimmed video clips. Current Computer Vision solutions based on Deep Learning approaches provide astonishing results. However, their success relies on large collections of labeled datasets for supervised learning to guarantee that they generalize well to diverse testing scenarios. Although plentiful annotated data may be available for some pre-specified domains, manual annotation is unfeasible for every ad-hoc target domain or task. As a result, in many real-world applications, there is a domain shift between the distributions of the train (source) and test (target) domains, causing a significant drop in performance at inference time. To tackle this problem, we propose an Unsupervised Domain Adaptation scheme for video violence detection based on single image classification that mitigates the domain gap between the two domains. We conduct experiments considering as the source labeled domain some datasets containing violent/non-violent clips in general contexts and, as the target domain, a collection of videos specific for detecting violent actions in public transport, showing that our proposed solution can improve the performance of the considered models.
Download

Short Papers
Paper Nr: 1
Title:

Vegetation Coverage and Urban Amenity Mapping Using Computer Vision and Machine Learning

Authors:

Nicholas Karkut, Alexey Kiriluk, Zihao Zhang and Zhigang Zhu

Abstract: This paper proposes a computer vision-based workflow that analyses Google 360-degree street views to understand the quality of urban spaces regarding vegetation coverage and accessibility of urban amenities such as benches. Image segmentation methods were utilized to produce an annotated image with the amount of vegetation, sky and street coloration. Two deep learning models were used -- Monodepth2 for depth detection and YoloV5 for object detection -- to create a 360-degree diagram of vegetation and benches at a given location. The automated workflow allows non-expert users like planners, designers, and communities to analyze and evaluate urban environments with Google Street Views. The workflow consists of three components: (1) user interface for location selection; (2) vegetation analysis, bench detection and depth estimation; and (3) visualization of vegetation coverage and amenities. The analysis and visualization could inform better urban design outcomes.
Download

Paper Nr: 4
Title:

A Deep Learning Approach for Estimating the Rind Thickness of Trentingrana Cheese from Images

Authors:

Andrea Caraffa, Michele Ricci, Michela Lecca, Carla Maria Modena, Eugenio Aprea, Flavia Gasperi and Stefano Messelodi

Abstract: Checking food quality is crucial in food production and its commercialization. In this context, the analysis of macroscopic visual properties of the food, like shape, color, and texture, plays an important role as a first assessment of the food quality. Currently, such an analysis is mostly performed by experts, who observe, smell, taste the food, and judge it based on their training and experience. Such an assessment is usually time-consuming and expensive, so it is of great interest to support it with automated and objective computer vision tools. In this paper, we present a deep learning method to estimate the rind thickness of Trentingrana cheese from color images acquired in a controlled environment. Rind thickness is very important for the commercial selection of this cheese and is commonly considered to evaluate its quality, together with other sensory features. We tested our method on 90 images of cheese slices, where the ground-truth rind thickness was defined using the measures provided by a panel of 12 experts. Our method achieved a Mean Absolute Error (MAE) of ≈ 0.5 mm, which is half the ≈ 1.2 mm error produced on average by the experts compared to the defined ground-truth.
Download

Paper Nr: 12
Title:

Climbing with Virtual Mentor by Means of Video-Based Motion Analysis

Authors:

Julia Richter, Raul B. Beltrán, Guido Köstermeyer and Ulrich Heinkel

Abstract: Due to the growing popularity of climbing, research on non-invasive, camera-based motion analysis has received increasing attention. While extant work uses invasive technologies, such as wearables or modified walls and holds, or focusses on competitive sports, we for the first time propose a system that automatically detects motion errors that are typical for beginners with a low level of climbing experience by means of video analysis. In our work, we imitate a virtual mentor that provides an analysis directly after having climbed a route. We thereby employed an iPad Pro fourth generation with LiDAR to record climbing sequences, in which the climber’s skeleton is extracted using the Vision framework provided by Apple. We adapted an existing method to detect joints movements and introduced a finite state machine that represents the repetitive phases that occur in climbing. By means of the detected movements, the current phase can be determined. Based on the phase, single errors that are only relevant in specific phases are extracted from the video sequence and presented to the climber. Latest empirical tests with 14 probands demonstrated the working principle. We are currently collecting data of climbing beginners for a quantitative evaluation of the proposed system.
Download

Paper Nr: 18
Title:

3D Reference-Based Skeletal Movement Evaluation

Authors:

Lars Lehmann, Roman Seidel and Gangolf Hirtz

Abstract: In a medical therapy, the exact execution of the training exercises developed by the therapist is a crucial task for the success of the therapy. Currently, a therapist has to treat up to 15 patients at the same time on an outpatient basis. To compensate this deficit, an automated assistance system needs to be created. Previous approaches have focused on a parameterised segment angle-based assessment for training exercise feedback. This work focuses on a reference-based approach. This reference is created by the therapist and thus corresponds to the ideal movement model and can be individually adapted to the patient. It is necessary to compare this reference with the patients’ real movement in real time, to detect deviations and to output them as errors. For this purpose, the reference can be adapted to the body size of the patient and the patients’ current position and orientation can be taken into account, or it can be described by reference segments, i.e. an angle-based comparison of the reference. Our work highlights the the segment and reference-based assessment approach and compares them to each other.
Download

Paper Nr: 26
Title:

An Integrated Mobile Vision System for Enhancing the Interaction of Blind and Low Vision Users with Their Surroundings

Authors:

Jin Chen, Satesh Ramnath, Tyron Samaroo, Fani Maksakuli, Arber Ruci, E’edresha Sturdivant and Zhigang Zhu

Abstract: This paper presents a mobile-based solution that integrates 3D vision and voice interaction to assist people who are blind or have low vision to explore and interact with their surroundings. The key components of the system are the two 3D vision modules: the 3D object detection module integrates a deep-learning based 2D object detector with ARKit-based point cloud generation, and an interest direction recognition module integrates hand/finger recognition and ARKit-based 3D direction estimation. The integrated system consists of a voice interface, a task scheduler, and an instruction generator. The voice interface contains a customized user request mapping module that maps the user’s input voice into one of the four primary system operation modes (exploration, search, navigation, and settings adjustment). The task scheduler coordinates with two web services that host the two vision modules to allocate resources for computation based on the user request and network connectivity strength. Finally, the instruction generator computes the corresponding instructions based on the user request and results from the two vision modules. The system is capable of running in real time on mobile devices. We have shown preliminary experimental results on the performance of the voice to user request mapping module and the two vision modules.
Download

Paper Nr: 29
Title:

Facial Expression Recognition with Quarantine Face Masks Using a Synthetic Dataset Generator

Authors:

Yücel Çelik and Sezer Gören

Abstract: The usage of face masks has increased dramatically in recent years due to the pandemic. This made many systems that depended on a full facial analysis not as accurate on faces that are covered with a face mask, which may lead to errors in the system. In this paper, we propose a Convolutional Neural Network (CNN) model that was trained solely on face masks to be more accurate and on point, that could more easily determine facial expressions. Our CNN model was trained with a seven different expression category dataset that only had people with face masks. Although we could not find a suitable dataset with face masks, we opted to generate a synthetic one. The dataset generation was done using Python and the help of the OpenCV library. The process is, after finding the dimensions of the face, we Perspective Transform the face mask object to be able to overlay it on the face. After that, the CNN model was also generated using Python with a CNN model. Using this method we gathered favorable results on the test subjects with 70.1% accuracy on the validation batch where previous facial expression recognition systems mostly failed to even recognize the face since they were not trained to recognize faces with face masks.
Download