STCM-Mamba - Multimodal Spatio-Temporal Cross-Modal Mamba for Depression Detection

Depression is a prevalent mental disorder with severe physiological symptoms and high diagnosis costs, however, the development of efficient and accurate depression detection systems remains challenging. While deep learning methods leveraging multimodal data have shown promise, existing approaches suffer from two critical limitations. One is the lack of effective spatiotemporal feature integration, and the other is the inability to balance effective long-sequence modeling and computational complexity. To address these challenges, we propose STCM-Mamba, a novel Spatio-Temporal Cross-Modal Mamba (STCM-Mamba) framework for efficient and accurate depression detection. The STCM-Mamba comprises three modules: a Spatio-Temporal Mamba Module (STMM), a Cross-Modal Mamba Module (CMMM), and a Depression Classification Module (DCM). The STMM consists of a Temporal Mamba Block (TMB) and a Spatial Mamba Block (SMB) for capturing spatiotemporal information for each modality, while the CMMM enhances intermodal and intramodal representation learning. Experiments on two multimodal depression datasets demonstrate that STMM and CMMM significantly contribute to performance improvements, and the STCM-Mamba outperforms state-of-the-art methods.

Fulltext Access

https://ieeexplore.ieee.org/document/11223210

Citing

@Article{Zhou2025,

  author={Zhou, Bowen and Fiedler, Marc-André and Al-Hamadi, Ayoub},

  journal={IEEE Access}, 

  title={STCM-Mamba: Multimodal Spatio-Temporal Cross-Modal Mamba for Depression Detection}, 

  year={2025},

  doi={10.1109/ACCESS.2025.3627778}}