ORAKEL

Deep Learning-Based Gaze Estimation - A Review

2026-03-25T00:00:00+00:00

Deep Learning-Based Gaze Estimation: A Review

Gaze estimation, a critical facet of understanding user intent and enhancing human–computer interaction, has seen substantial advancements with the integration of deep learning technologies. Despite the progress, the application of deep learning in gaze estimation presents unique challenges, notably in the adaptation and optimization of these models for precise gaze tracking. This paper conducts a thorough review of recent developments in deep learning-based gaze estimation, with a particular focus on the evolution from traditional methods to sophisticated appearance-based techniques. We examine the key components of successful gaze estimation systems, including input feature processing, neural network architectures, and the importance of data preprocessing in achieving high accuracy. Our analysis extends to a comprehensive comparison of existing methods, shedding light on their effectiveness and limitations within various implementation contexts. Through this systematic review, we aim to consolidate existing knowledge in the field, identify gaps in current research, and suggest directions for future investigation. By providing a clear overview of the state-of-the-art in gaze estimation and discussing ongoing challenges and potential solutions, our work seeks to inspire further innovation and progress in developing more accurate and efficient gaze estimation systems.

Fulltext Access

https://www.mdpi.com/2218-6581/15/4/69

Citing

@Article{Abdelrahman2026,

  author={Abdelrahman, Ahmed and Al-Tawil, Basheer and Al-Hamadi, Ayoub},

  journal={Robotics}, 

  title={Deep Learning-Based Gaze Estimation: A Review}, 

  year={2026},

  doi={10.3390/robotics15040069}}

Die ORAKEL-Studie Magdeburg - Rezidivfrüherkennung bei Depression durch KI-gestützte Audio-/Videoanalyse

2026-03-01T00:00:00+00:00

Die ORAKEL-Studie Magdeburg - Rezidivfrüherkennung bei Depression durch KI-gestützte Audio-/Videoanalyse

Etwa 50 % der Patienten mit Depression erleben trotz Behandlung einen Rückfall. Die ORAKEL-Studie in Magdeburg testet, ob multimodale Audio-/Videoanalysen die Früherkennung dieser Rückfälle verbessern. Über einen Zeitraum von 48 Wochen werden sechs ärztlich-psychologische Verlaufsuntersuchungen durchgeführt, bei denen parallel Sprach-, Mimik- und Vitalparameter erfasst werden. Künstliche Intelli- genz (KI) Modelle sollen entwickelt werden, die Veränderungen erkennen, die sich leicht dem ärztlichen Blick entziehen – zur intelligenten Unterstützung, nicht zum Ersatz der klinischen Beurteilung.

Fulltext Access

https://www.aerzteblatt-sachsen-anhalt.de/component/content/article/die-orakel-studie-magdeburg-fb-2026-03

STCM-Mamba - Multimodal Spatio-Temporal Cross-Modal Mamba for Depression Detection

2025-10-31T00:00:00+00:00

Depression is a prevalent mental disorder with severe physiological symptoms and high diagnosis costs, however, the development of efficient and accurate depression detection systems remains challenging. While deep learning methods leveraging multimodal data have shown promise, existing approaches suffer from two critical limitations. One is the lack of effective spatiotemporal feature integration, and the other is the inability to balance effective long-sequence modeling and computational complexity. To address these challenges, we propose STCM-Mamba, a novel Spatio-Temporal Cross-Modal Mamba (STCM-Mamba) framework for efficient and accurate depression detection. The STCM-Mamba comprises three modules: a Spatio-Temporal Mamba Module (STMM), a Cross-Modal Mamba Module (CMMM), and a Depression Classification Module (DCM). The STMM consists of a Temporal Mamba Block (TMB) and a Spatial Mamba Block (SMB) for capturing spatiotemporal information for each modality, while the CMMM enhances intermodal and intramodal representation learning. Experiments on two multimodal depression datasets demonstrate that STMM and CMMM significantly contribute to performance improvements, and the STCM-Mamba outperforms state-of-the-art methods.

Fulltext Access

https://ieeexplore.ieee.org/document/11223210

Citing

@Article{Zhou2025,

  author={Zhou, Bowen and Fiedler, Marc-André and Al-Hamadi, Ayoub},

  journal={IEEE Access}, 

  title={STCM-Mamba: Multimodal Spatio-Temporal Cross-Modal Mamba for Depression Detection}, 

  year={2025},

  doi={10.1109/ACCESS.2025.3627778}}

Cross-Modal Fusion Mamba for Multimodal Depression Detection

2025-10-29T00:00:00+00:00

Depression detection using multimodal signals has garnered growing attention due to its potential for early warning. However, existing approaches often rely on limited visual features or computationally intensive fusion mechanisms. In this study, we present a novel framework based on the Mamba structure to address these challenges. Our method fuses audio features with enriched visual representations by combining facial landmarks and action units (AUs), enhancing the expressiveness of visual cues. To capture intramodal information, we propose the Audio Mamba Encoder (AME) for audio modality, and the Vision CrossMamba (VCM) module for visual feature fusion. Furthermore, the Audio-Vision CrossMamba (AVCM) module is designed for intermodal interactions. Experimental results demonstrate superior performance over several baselines, highlighting the effectiveness of the proposed framework in detecting depression from multimodal data.

Fulltext Access

https://ieeexplore.ieee.org/document/11259458

Citing

@INPROCEEDINGS{Zhou2025,

  author={Zhou, Bowen and Fiedler, Marc-André and Al-Hamadi, Ayoub},

  booktitle={2025 14th International Symposium on Image and Signal Processing and Analysis (ISPA)}, 

  title={Cross-Modal Fusion Mamba for Multimodal Depression Detection}, 

  year={2025},

  doi={10.1109/ISPA66905.2025.11259458}}

Multi-Modal AI-Based Pain Detection in Intermediate Care Patients in the Postoperative Phase

2025-10-08T00:00:00+00:00

“Multi-modal AI-Based Pain Detection in Intermediate Care Patients in the Postoperative Phase” is an interdisciplinary research work that operates in the domain of automated pain detection. It aims to improve previous work, based on pain databases like BioVid and UNBC shoulder pain, as well as AI-based approaches using computer vision and signal processing to analyze available modalities. Thus, we present our basic research idea on how to improve automatic pain detection in three major steps. The first step focuses on collecting pain data from postoperative patients in intermediate care stations (IMC). In addition, patients who are not fully oriented should be included in a separate data collection as a second focus group. Then, improvements on the state-of-the-art models should not only advance general pain detection, but also help bridge the gap to the real-world setting of the IMC data. Improvements include transferability analysis, feature selection evaluation, and balancing of data distribution to deliver better classification performance. In a last step, we aim to test, verify and evaluate the classification performance on the IMC data with the support of medical practitioners.

Fulltext Access

https://ieeexplore.ieee.org/document/11343734

Citing

@INPROCEEDINGS{Nienaber2025,

  author={Nienaber, Sören and Wang, Huibin and Hempel, Thorsten and Walter, Steffen and Barth, Eberhard and Al-Hamadi, Ayoub},

  booktitle={2025 IEEE International Conference on Systems, Man, and Cybernetics (SMC)}, 

  title={Multi-Modal AI-Based Pain Detection in Intermediate Care Patients in the Postoperative Phase}, 

  year={2025},

  doi={10.1109/SMC58881.2025.11343734}}

Patient Data Recording Start

2025-04-01T00:00:00+00:00

Data Collection at University Hospital for Psychiatry and Psychotherapy Begins

We are pleased to announce that the data collection phase has officially begun at our partner institution, the University Hospital for Psychiatry and Psychotherapy. The study will involve 120 patients, with each patient attending six scheduled sessions.

Each session consists of:

A medical-psychiatric interview
A psychological interview
Standardized clinical ratings

The data collection takes place in a specially equipped room, featuring state-of-the-art hardware for recording purposes. To ensure comprehensive data capture, three cameras are used during each session:

One focusing on the patient
One on the doctor or psychologist
One monitoring the patient’s gait when entering and leaving the room

The data collected will enable the extraction and analysis of various features for detailed behavioral and condition assessment. This includes:

Facial expressions
Head and body posture
Emotions
Gait analysis
Vital signs
Speech patterns

This initiative marks a significant step forward in our collaborative efforts to better understand psychiatric conditions and improve patient care through innovative technology and research.