Multimodal Engagement Prediction in Multiperson Human–Robot Interaction

A real-time multimodal system for predicting engagement and disengagement in collaborative industrial HRI, using deep learning and rule-based approaches.

June 13, 2022 • Ahmed A. Abdelrahman, Dominykas Strazdas, Aly Khalifa, Jan Hintz, Thorsten Hempel, Ayoub Al-Hamadi

Journal IEEE HRI RoSA Engagement Multimodal

Real-Time Engagement Estimation for Human–Robot Collaboration

This study presents a multimodal engagement prediction model for human–robot interaction in industrial environments. Designed for multi-person, open-world settings, the system uses a combination of deep learning and rule-based policies to predict when humans intend to interact with a collaborative robot (cobot) and when they disengage.

Key features:

Three-stage architecture:
1. Deep learning for feature extraction (gaze, head pose, body posture, face ID)
2. A neural network to estimate the focus of attention
3. Rule-based classification to decide on engagement/disengagement
Person identification via face recognition allows tracking and re-engagement across multiple users.
Validated through an study with 11 participants, performing collaborative tasks with a UR5e cobot.
Achieved performance:
- Precision: 96%
- Recall: 90%
- F-score: 93%
Participants rated the engagement module with a satisfaction score of 5.65 (on a 7-point scale).
Real-time capable, with a total frame processing time of approximately 40 ms.

This work supports safe, contactless, multimodal human–robot collaboration suitable for multi-user industrial applications.

Fulltext Access

https://doi.org/10.1109/ACCESS.2022.3182469

Citing

```bibtex @ARTICLE{EngagementHRI2022, author={Abdelrahman, Ahmed A. and Strazdas, Dominykas and Khalifa, Aly and Hintz, Jan and Hempel, Thorsten and Al-Hamadi, Ayoub}, journal={IEEE Access}, title={Multimodal Engagement Prediction in Multiperson Human–Robot Interaction}, year={2022}, volume={10}, pages={61979–61991}, doi={10.1109/ACCESS.2022.3182469} }