| Abstract: |
Infant safety during sleep is a major concern for parents and caregivers, with sleep-related accidents representing a signicant cause of preventable infant mortality. Existing solutions, whether commercial or research-based, present critical limitations: reliance on uncomfortable wearable sensors, lack of proactive risk detection, dependence on expensive specialized hardware, and absence of real-world validation. This paper presents BabyGuard, an intelligent infant monitoring system that introduces a novel region-specic facial analysis framework partitioning infant facial processing into eye-state classication (via netuned Vision Transformer, 98.1% accuracy on held-out test set of 7,100 images) and occlusion detection (via MediaPipe Face Mesh). This specialized approach, combined with YAMNet-based audio analysis (100% cry recall), enables robust multi-modal risk detection. The modular system architecture includes a Flask backend with Firebase authentication, dual interfaces (Tkinter desktop for local control and Flutter mobile for remote monitoring with push notications), and optimized Firestore data persistence. We provide comprehensive evaluation using four complementary datasets: Open-Closed Eyes Dataset [19] (177,100 images), CryCeleb2023, Infant Cry Dataset, and DESED (3,790 audio samples total). Comparative analysis against state-of-the-art methods demonstrates significant improvements in accuracy (98.1% vs. 85-91%) and functionality. A 30-day deployment study with ve families (infants aged 2-8 months) validated practical eectiveness, detecting 12 critical risk situations with 3.2% false alert rate and achieving 4.6/5.0 user satisfaction. All participants provided informed consent under an IRB-approved protocol. The complete system is available under MIT License. |