IMPROVE 2026 Abstracts


Area 1 - Fundamentals

Full Papers
Paper Nr: 22
Title:

Cut-Aware Scene-Adaptive Team Affiliation in Broadcast AFL Games

Authors:

Hongwei Yin, Richard O. Sinnott and Glenn T. Jayaputera

Abstract: Automated video analysis of team sports and especially the identification and disambiguation of teams forms the cornerstone for event analytics, e.g. which player/team did what in a given match. In this paper, we propose an end-to-end robust team affiliation framework focused on the Australian Football League (AFL). This framework supports three key components: (i) a fine-tuned YOLOv12 object detector model leveraging a novel AFL dataset established for precise identification of players, referees, staff, and the ball; (ii) a multi-stream feature embedding module utilising a Vision Transformer appearance encoder with a feature extractor and gated feature-fusion mechanism, and (iii) a series of cut-aware temporal smoothing and occlusion handling policies to ensure stable tracking and identification of identified objects. Our fine-tuned YOLOv12 model achieves an 88.3% accuracy on overall classes; a 97.3% accuracy for player classes, whilst the multi-stream feature embedding module achieves 96.6% accuracy for team affiliation prediction. This approach provides the basis for higher-level analysis and event detection of AFL games.

Paper Nr: 33
Title:

An Integrated Framework for Video Compositing with Neural Harmonization and Adaptive Occlusion Handling

Authors:

Maxim Osadchiy, Sviatoslav Stumpf and Valeria Efimova

Abstract: This paper presents an integrated framework for video compositing that combines advanced neural harmonization with adaptive occlusion handling. The proposed system implements a comprehensive pipeline featuring geometric alignment through SIFT feature matching, photorealistic harmonization leveraging a Vision Transformer-based model (PCTNet-ViT), and intelligent occlusion handling through GrabCut segmentation. A key innovation is the seamless integration of neural harmonization that adapts inserted content to background lighting and texture conditions while maintaining temporal consistency. We evaluate three harmonization backbones (PCTNet-ViT, INR, Duco) and demonstrate that PCTNet-ViT achieves superior performance with average PSNR and SSIM gains of +0.968~dB and +0.00118, respectively, whereas INR and Duco show negative gains. Experimental results further demonstrate significant improvements in visual quality, with SSIM and PSNR metrics computed on full frames showing superior performance. The system effectively handles complex scenarios including perspective changes, lighting variations, and foreground occlusions.

Short Papers
Paper Nr: 11
Title:

Benchmarking 2D and 3D CNN Architectures for CT-Based Medical Deepfake Detection

Authors:

Abdel Motalib Lagsoun, Mustapha Oujaoura, Mustapha Hedabou, Abdelilah Jraifi and Anass Garbaz

Abstract: In literature, the term deepfake has traditionally referred to face manipulation and face swapping. However, recent advancement have expanded the concept into other critical domains, including healthcare. In the medical field, deepfakes may involve artificially adding or removing signs of disease in diagnostic images, causing insurance fraud or misleading clinical decision-making. Other cases involve the generation of synthetic medical scans that mimic tumors or lesions to fabricate patient records.This paper address the growing concern of medical deepfakes by developing detection deep learning models for an enhanced CT-based GAN lung dataset. We benchmark multiple architectures, including handcrafted 2D CNNs and 3D CNN models, to assess their capability in distinguishing authentic CT scans from manipulated ones. The proposed Small_CNN_BN model achieves 91.56% accuracy and 91.56% recall, outperforming the baseline by 4.10% and 4.5%, respectively. Furthermore, the Wider_3DCNN reaches 93.98% accuracy and 93.98% recall, surpassing the baseline by 2.63% and 2.8%. These results highlight the effectiveness of both lightweight 2D and volumetric 3D architectures in detecting medical deepfakes.

Paper Nr: 13
Title:

BabyGuard: An Intelligent Infant Monitoring System Using Computer Vision and Deep Learning

Authors:

Ikram El Bouhali, Abdelilah Jraifi and Omar Boudellah

Abstract: Infant safety during sleep is a major concern for parents and caregivers, with sleep-related accidents representing a signicant cause of preventable infant mortality. Existing solutions, whether commercial or research-based, present critical limitations: reliance on uncomfortable wearable sensors, lack of proactive risk detection, dependence on expensive specialized hardware, and absence of real-world validation. This paper presents BabyGuard, an intelligent infant monitoring system that introduces a novel region-specic facial analysis framework partitioning infant facial processing into eye-state classication (via netuned Vision Transformer, 98.1% accuracy on held-out test set of 7,100 images) and occlusion detection (via MediaPipe Face Mesh). This specialized approach, combined with YAMNet-based audio analysis (100% cry recall), enables robust multi-modal risk detection. The modular system architecture includes a Flask backend with Firebase authentication, dual interfaces (Tkinter desktop for local control and Flutter mobile for remote monitoring with push notications), and optimized Firestore data persistence. We provide comprehensive evaluation using four complementary datasets: Open-Closed Eyes Dataset [19] (177,100 images), CryCeleb2023, Infant Cry Dataset, and DESED (3,790 audio samples total). Comparative analysis against state-of-the-art methods demonstrates significant improvements in accuracy (98.1% vs. 85-91%) and functionality. A 30-day deployment study with ve families (infants aged 2-8 months) validated practical eectiveness, detecting 12 critical risk situations with 3.2% false alert rate and achieving 4.6/5.0 user satisfaction. All participants provided informed consent under an IRB-approved protocol. The complete system is available under MIT License.

Paper Nr: 35
Title:

SPAX-S: Shapley-Based Point Attribution Explanations for Interpreting 3D Point Cloud Segmentation

Authors:

Le Hoang Pham, Muhammad Shoaib Sarwar and Faizan Ahmed

Abstract: Explainable artificial intelligence for 3D point clouds has mainly focused on classification tasks, while segmentation models often lack effective explanation methods. This gap exists because segmentation assigns a specific semantic label to each point, treating each as an individual participant in a cooperative game where the point-specific semantic prediction is the payoff. In this case, segmentation faces different interpretive challenges compared to classification. To address these challenges, we introduce SPAX‑S (Shapley-based Point Attribution eXplanation for Segmentation), a model‑agnostic extension of the existing SPAX framework designed specifically for point‑wise semantic segmentation. SPAX-S restricts the Shapley game to the k nearest neighbors of each target point, excludes the target point from the player set, and estimates neighbor-level Shapley values using Monte Carlo sampling. We evaluate SPAX-S across all 16 categories of the ShapeNetPart dataset using PointNet++ as the segmentation backbone. SPAX-S achieves a mean precision of 0.8428 and maintains 100% label preservation even when 69.24% of neighborhood points are removed. A comparative analysis with Grad-CAM and saliency maps shows that SPAX-S produces better top-k overlap (0.2576 versus 0.2144 and 0.2016), while gradient-based methods offer complementary strengths in overall prediction faithfulness. Robustness testing indicates that SPAX-S remains acceptably stable under scaling and rotation perturbations of the geometric structure. These results suggest that SPAX-S is a valuable tool for local interpretability of 3D point cloud segmentation models.

Paper Nr: 18
Title:

Evolutionary Priors in Vision: Linking Radial Attention Statistics and Hexagonal Spectral Structure in Natural Images

Authors:

Fatemeh Tavakoli and Siamak Khatibi

Abstract: Visual attention and spectral representation are two fundamental components of biological vision, yet they are typically studied as independent phenomena. This paper revisits two empirical observations and proposes a hypothesis linking them under a common statistical interpretation of natural scene structure. First, analysis of eye-tracking data collected under free-viewing conditions reveals a robust radial distribution of visual attention with respect to image center. This behavior is summarized through the Probability of Characteristic Radially Dependency Function (Probability-CRDF), derived from aggregated fixation statistics across multiple subjects and image datasets. Second, analysis of natural images under hexagonal sampling reveals structured orientationdependent patterns in the frequency domain in addition to the wellknown 1/f α radial power-law spectrum. We hypothesize that these spatial and spectral regularities reflect coupled priors shaped by long-term exposure to natural scene statistics. Spatially, the radial attention distribution can be interpreted as a probabilistic allocation of perceptual resources across retinal eccentricity. Spectrally, hexagonal sampling reveals orientation-resolved energy structures that may align with isotropic sampling properties observed in biological photoreceptor mosaics. While these observations alone do not establish a causal connection, their coexistence suggests a potential statistical linkage between spatial attention mechanisms and the geometry of visual sampling. Building on this hypothesis, we outline an experimental framework to evaluate whether jointly incorporating radial spatial priors and hexagonally structured spectral representations improves attention prediction and pattern discrimination in natural images. The proposed framework aims to bridge empirical findings from eye-tracking studies and spectral analysis of natural scenes, offering a testable perspective on how environmental statistics may shape both biological and artificial vision systems.

Area 2 - Methods and Techniques

Full Papers
Paper Nr: 14
Title:

Robust Object-Layer Construction of 3D Scene Graphs Using Instance Segmentation

Authors:

Gawtam Chithra Ramesh, Püren Güler, Hiba Alqaysi, Marcus Valtonen Örnhag, Héctor Caltenco and Erdal Akin

Abstract: 3D Scene Graphs (3DSGs) are high-level representations of 3D environments that encode hierarchical contextual information (e.g., objects, properties, relations). They can be used in applications such as autonomous driving, robotics, and extended reality (XR), where detailed scene understanding is critical for decision-making and interaction. A key step in constructing 3DSGs is assigning semantic categories to scene graph nodes at the objects layer (e.g., object nodes), which is achieved through semantic labeling of the sensory data and therefore essentially depends on the labeling accuracy. Incorrect semantic labeling causes erroneous scene graph creation, e.g., under-segmentation resulting in merged objects in the 3D space. While not every 3DSG framework is equally sensitive to incorrect semantic labeling—some may incorporate implicit mitigation mechanisms—the sensitivity of these systems to such errors remains insufficiently studied, limiting the development of more robust future frameworks. In this paper, we investigate how semantic labeling affects the accuracy of constructing 3DSGs, focusing on parameter sensitivity during object discovery. We integrate instance segmentation and tracking into Hydra, a state-of-the-art 3DSG framework originally based on semantic segmentation, to produce instance-aware 3DSGs. Our comparative experiments show that semantic-only object layer construction in the original Hydra is highly sensitive to parameter tuning, while the proposed instance-aware approach enhances robustness by decoupling object discovery from such heuristic tuning.

Paper Nr: 42
Title:

Diffusion-Based Hand Motion Generation for Indian Sign Language Using a Dedicated Hand Module

Authors:

Somnath Mukhopadhyay, Sunita Sarkar, Suman Dahal, Hrishav Bakul Barua, Piyush Chauhan and Rakesh Kumar

Abstract: Indian Sign Language (ISL) relies heavily on precise finger articulation, where small variations in motion can change meaning. While prior work has focused on recognition and full-body sign synthesis, detailed and temporally stable hand-motion generation remains challenging. {This paper proposes a diffusion-inspired framework designed specifically for ISL hand motion. The model operates in 3D hand joint space and incorporates velocity-based temporal constraints to improve motion consistency.} Instead of focusing only on performance improvement, this work studies how different motion representations affect hand motion generation. Our results show that pose-based diffusion provides better spatial accuracy, while velocity-based temporal modeling captures motion dynamics but may introduce small positional errors due to integration. {Temporal smoothness, bone-length consistency, and velocity regularization are incorporated during training to preserve stable and anatomically plausible hand motion.} The approach is evaluated on multiple datasets, including WLASL-based data, How2Sign, and an ISL hand landmark dataset. While pose-space diffusion generally achieves stronger spatial reconstruction accuracy, the velocity-aware formulation captures temporal motion dynamics and produces coherent hand motion sequences. Overall, this work provides a comparative analysis of pose-based and velocity-aware motion representations for hand motion generation in Indian Sign Language.

Short Papers
Paper Nr: 27
Title:

A Training-Free Compression-Based Similarity Measure for Video Using Spatiotemporal Normalized Compression Distances

Authors:

Iker Gondra

Abstract: The Normalized Information Distance (NID), rooted in Kolmogorov complexity, provides a universal, feature-independent notion of similarity. Previous work investigated compression-based approximations of the NID for image retrieval. In this paper, we extend this framework from static images to video data. Rather than treating videos as monolithic byte streams, the proposed method decomposes videos into compact spatiotemporal patches and measures similarity by compressing residuals between corresponding patches under soft temporal alignment. This patch-based formulation preserves the theoretical foundations of the NID while adapting the computation to the spatiotemporal structure and scale of video data. We evaluate the proposed approach on standard video benchmarks and show that it captures meaningful action similarity directly from raw pixel data, without learning or feature extraction. To the best of our knowledge, this work constitutes the first patch-based, multi-scale approximation of Kolmogorov complexity–based NID for video similarity. While not intended to compete with modern deep learning–based video representations, the proposed method provides a conceptually simple, training-free, and fully unsupervised alternative for measuring video similarity.

Area 3 - Imaging

Short Papers
Paper Nr: 17
Title:

Hierarchical and Tiny Recursive Models for Medical Image Captioning

Authors:

Cornel Alexandru Badea

Abstract: Hierarchical and recursive reasoning are emerging as critical capabilities for overcoming the fixed-depth limitations of standard transformer architectures, particularly in domains requiring structured logical synthesis like medical image captioning. While recent Hierarchical Reasoning Models (HRM) have shown promise by mimicking multitimescale cognitive processes [21], their application to vision-language tasks remains underexplored. In this work, we introduce a unified ImageHRM framework that integrates high-performance visual backbones (ResNet, Swin Transformer, and the multimodal FuseLIP) into a recurrent reasoning core. We propose a novel Triple-Loop (H-M-L) architecture designed to mirror the radiological workflow, utilizing intermediate semantic clustering to bridge low-level observations and high-level diagnostic impressions. Furthermore, we investigate the ”Less is More” paradigm through the Tiny Recursive Model (TRM) [8], which radically simplifies the architecture to a single, shared-weight network (only ∼7M parameters) trained via full backpropagation through time (BPTT). Evaluating these systems on the large-scale ROCOv2 dataset, a standard benchmark in medical image captioning challenges like ImageCLEFmed [16], we provide a rigorous comparative analysis of reasoning depth versus visual feature quality. Our results demonstrate that while the Triple-Loop ImageHRM with FuseLIP establishes a strong baseline, the computationally efficient ImageTRM-Swin variant achieves highly competitive performance. This surprising finding suggests that deep recursive reasoning, when seeded with robust hierarchical visual features, can rival significantly larger and more complex systems. By acting as a structural regularizer against overfitting on domain-specific datasets, TRM offers a scalable path toward high-fidelity automated radiology reporting.

Paper Nr: 32
Title:

Experimental Validation of an AI-Driven LWIR Polarimetric Imaging Framework for Transport Monitoring

Authors:

Jan Pařez, Patrik Kovář, Adam Tater, Lucie Borovičková, Ondřej Ballada and Čestmír Barta

Abstract: This paper presents the development and experimental evaluation of an intelligent system using long-wave infrared (LWIR) polarimetric imaging combined with machine learning. By exploiting \linebreak polarization-based contrast mechanisms, the approach improves detection of surface features that are difficult to identify with conventional methods. The work includes the design of an experimental setup and the creation of a representative LWIR polarimetric dataset. A modular framework integrating convolutional neural networks and image quality metrics enables automated scene interpretation. Results demonstrate enhanced detection of transport-related phenomena such as thin liquid films, surfaces, and hidden contamination, supporting future mobility and safety applications.

Paper Nr: 34
Title:

Integrating Calomel-Based Polarimetry with Perceptual Metrics for Improved Scene Analysis

Authors:

Patrik Kovář, Adam Tater, Jan Pařez, Lucie Borovičková, Ondřej Ballada and Čestmír Barta

Abstract: Polarization imaging offers complementary information to conventional intensity- or temperature-based imaging, enhancing object detection, material characterization, and scene understanding under challenging conditions. Its applications span environmental monitoring, industrial inspection, robotics, and biomedical imaging, with Long Wavelength Infrared (LWIR) polarimetry demonstrating particular sensitivity to ice-related structures and thin surface layers, making it valuable for safety-critical transport scenarios. Recent advances in sensor technology and computational image processing have enabled compact polarization cameras, integrating micro-polarizer arrays fabricated from specialized materials. In this work, we present a system that uniquely incorporates a calomel polarimetric element and investigate algorithmic metrics for identifying acquisition conditions that maximize both image quality and human perceptual recognizability of target objects. By combining polarimetric imaging with perceptually motivated evaluation criteria, the study establishes a framework for optimizing acquisition parameters and demonstrates the potential of advanced polarimetric systems for enhanced visual analysis in complex environments.

Area 4 - Machine Learning

Full Papers
Paper Nr: 12
Title:

Use of Foundation Models for Renal Glomerulus Classification in Highly Imbalanced Data Scenarios

Authors:

Weslei Santos Pinheiro, Flavio de Barros Vidal, Luciano Rebouças de Oliveira, Washington Luis Conrado dos Santos and Angelo Amancio Duarte

Abstract: Computational Pathology (CPath) applies artificial intelligence techniques to the automated analysis of histopathological images, supporting diagnosis in nephropathology, especially in the classification of glomerular lesions in renal biopsies. However, specific challenges in nephropathology, such as the scarcity of labeled data, high class imbalance, and variability of stains limit the performance of conventional models. Foundation models (FMs), pre-trained on large volumes of data, have demonstrated effectiveness in few-shot and imbalanced scenarios through transfer learning and fine-tuning. This study evaluates the UNI and UNI2 FMs in the classification of glomerular lesions in renal biopsy images, considering multiple stains and high class imbalance. This study investigates adaptation strategies, specifically Full Fine-Tuning (FFT) and Parameter-Efficient Fine-Tuning (PEFT), in conjunction with imbalance mitigation techniques, comparing Cost-Sensitive Learning (CSL) and Random Over-Sampling (ROS). The results indicate that CSL yields higher performance than oversampling approaches and that the evaluated foundation models generalize across different histological stains, achieving F1-scores of 98.27% for Amyloidosis, 97.46% for Normal, and 92.92% for Membranous Nephropathy. These findings indicate that FMs, when combined with cost-sensitive learning strategies, constitute an effective approach for glomerular lesion classification in nephropathology under conditions of severe class imbalance.

Short Papers
Paper Nr: 21
Title:

Reproducible Data Distribution Shifts for Testing Neural Network Adaptability

Authors:

David Reinberger, Michael Kargl, Phillip Kastner, Florian Eibensteiner and Josef Langer

Abstract: Object detection models such as MobileNetV2-SSD are commonly trained on large, general-purpose datasets that predominantly feature clean, idealized images. In deployment scenarios, however, these models are exposed to environmental influences they were never trained on, ranging from digital effects like noise, blur and distortion to physical factors such as dirt on the lens or differing lighting. These unseen influences can lead to a significant decrease in detection performance. To assess these robustness issues in a controlled and reproducible manner, this work proposes a dedicated test stand that can introduce both digital and physical environmental effects to images. Using this setup - comprising a Coral Dev Board with a Coral Camera Module, adjustable lighting, and physical camera inserts - a dataset of 9,184 images was created, covering 14 objects under a variety of influences and intensities. MobileNetV2-SSD was then fine-tuned on this dataset to measure performance loss under each influence and to determine how much of that performance can be recovered through targeted retraining. While simple disturbances such as lighting variations resulted only in minor impact, noise or contamination of the camera lens reduced the accuracy by as much as 30\%. Fine-tuning significantly improved robustness, with the best model - trained on all influences - achieved an average accuracy of roughly 87\% (0.87 mAP@[0.5:0.95]).

Paper Nr: 26
Title:

Geometry-Aware Stochastic Structured Channel Mixing for Deep Neural Networks

Authors:

Hirokazu Shimauchi

Abstract: Activation functions are structurally central to deep neural networks, yet training-time stochastic designs beyond deterministic pointwise mappings remain comparatively underexplored. We introduce Stochastic Structured Channel Mixing Activation (SCMA), a realvalued, geometry-aware module that injects training-only stochasticity while preserving the inference-time computation graph. SCMA samples a constrained pairwise channel-mixing transformation prior to a base nonlinearity, yielding a block-structured operator with closed-form spectral characterization, explicit operator-norm bounds, and a sufficient nondegeneracy condition for uniform invertibility during training. We evaluate SCMA within a fixed ResNet-20 backbone on five image-classification benchmarks under a matched training protocol, with additional reversedsplit stress tests on three of the five benchmarks. Across datasets and protocols, SCMA matches or improves upon ReLU in several settings, while comparison with a training-only additive-noise variant and their composition highlights regime-dependent interactions.

Paper Nr: 29
Title:

The Computational Efficiency–Performance Trade-Off in Non-Bayesian Ensemble Methods for Image Classification

Authors:

Kishan Rajdev and Hyung Jae Chang

Abstract: Ensemble methods are widely used to improve predictive performance and reliability in deep learning. While deep ensembles consistently achieve strong results, their computational cost has motivated the development of more efficient alternatives, including Snapshot Ensembles, Fast Geometric Ensembling (FGE), and Noisy Deep Ensembles. However, the extent to which these computationally efficient methods can match the performance of independently trained deep ensembles remains unclear, particularly under distribution shift. In this work, we present a systematic evaluation of non-Bayesian ensemble methods for image classification, focusing on the trade-off between computational efficiency and predictive performance. We compare a single-model baseline, Snapshot Ensembles, FGE, Noisy Deep Ensembles, and Deep Ensembles on CIFAR-10 across in-distribution (ID) evaluation, corruption robustness, and out-of-distribution (OOD) detection. Performance is assessed using standard accuracy metrics alongside likelihood-based and calibration-oriented measures. Our results reveal a consistent performance gap between efficient ensemble variants and Deep Ensembles. Although computationally efficient methods substantially reduce training cost, they underperform Deep Ensembles across ID, corruption, and OOD benchmarks. Further analysis shows that independently trained ensembles exhibit stronger functional diversity, which contributes to improved robustness and more reliable uncertainty estimation under distribution shift. Our findings highlight a persistent trade-off between computational efficiency and predictive performance in non-Bayesian ensemble methods.methods.

Area 5 - Multimedia Communications

Short Papers
Paper Nr: 43
Title:

Adaptive Predictor Selection for Lossless Medical Image Compression

Authors:

Basar Koc, Ziya Arnavut and Huseyin Kocak

Abstract: To protect data integrity and patient welfare, the FDA currently mandates the use of lossless compression algorithms for certain medical images, e.g., mammographic images. Achieving substantial new gains in lossless image compression, however, remains a challenge. Lossless image compression algorithms commonly use a plethora of predictive decorrelation methods by estimating each pixel from causal neighbors. Different prediction algorithms produce residuals with different compressibility characteristics, so no single predictor is optimal across all images. For example, PNG evaluates multiple predictors and selects the best-performing predictor. In this study, we propose a threshold-based preselection strategy that characterizes predictor behavior directly in the residual domain for each image and selects the best predictor without encoding every residual (error) stream.

Area 6 - Applications

Short Papers
Paper Nr: 19
Title:

Few-Shot Learning for Industrial Defect Detection on Novel Scanning Electron Microscopy Datasets

Authors:

Raghav Vacher, Vedrana Andersen Dahl, Gorm Gruner Jensen and Morten Niklas Gjerding

Abstract: Industrial Defect Detection (IDD) involves identifying defects in different products through the analysis of manufacturing data. Over recent years convolutional neural networks (CNN) have become the preferred method to reliably solve this task, though a lack of labeled data has been a key challenge for supervised methods that rely on CNNs. Few-Shot Learning (FSL) offers a promising solution by enabling models to learn tasks from only a small number of labeled examples. However, it shifts the need for large labeled datasets to the pre-training stage, raising questions about how well these models generalize to new domains, such as different imaging modalities. Therefore, this study empirically evalu- ates state-of-the-art FSL methods, trained on public optical datasets, for their effectiveness in IDD when tested on scanning electron microscopy (SEM) images. To facilitate benchmarking, this article also introduces three distinct SEM datasets for defect detection purposes. Through this assessment the study is able to identify strengths, challenges, and potential areas of improvement to motivate further research

Paper Nr: 28
Title:

A Robotic Tactile System for Inspecting Defects in Textile Fabrics

Authors:

J. Castaño-Amoros, P. Arredondo, E. Velasco and P. Gil

Abstract: Significant environmental pollution is generated by the textile industry; this can be reduced through the recycling and reuse of garments. These processes require reliable inspection systems to ensure fabric quality. In industry, inspection is typically performed using vision-based systems; however, certain defects could be more effectively identified through alternative sensing modalities such as touch. This paper presents a preliminary version of our robotic tactile system for the inspection of defects or damages in textile fabrics. The robot explores surface fabrics through touch. A software module based on neural models then processes tactile samples of the fabrics to detect and identify defects. In this preliminary version, the system is able to detect four types of defects, such as scratches caused by abrasion or friction, as well as cracks and holes. A wide range of shapes, sizes and amount of material loss were considered for each type of defect. The results demonstrate that the system exhibits a high degree of accuracy in its classifications, with a success rate ranging from 0.98 to 0.99. This is further supported by a significant response in location, measured at 0.82 in DICE. The highest number of failures occurs in the detection of friction scratches.