Multimodal Engagement Prediction in Multiperson Human–Robot Interaction

A real-time multimodal system for predicting engagement and disengagement in collaborative industrial HRI, using deep learning and rule-based approaches.

• Ahmed A. Abdelrahman, Dominykas Strazdas, Aly Khalifa, Jan Hintz, Thorsten Hempel, Ayoub Al-Hamadi

Real-Time Engagement Estimation for Human–Robot Collaboration

This study presents a multimodal engagement prediction model for human–robot interaction in industrial environments. Designed for multi-person, open-world settings, the system uses a combination of deep learning and rule-based policies to predict when humans intend to interact with a collaborative robot (cobot) and when they disengage.

Key features:

  • Three-stage architecture:
    1. Deep learning for feature extraction (gaze, head pose, body posture, face ID)
    2. A neural network to estimate the focus of attention
    3. Rule-based classification to decide on engagement/disengagement
  • Person identification via face recognition allows tracking and re-engagement across multiple users.

  • Validated through an study with 11 participants, performing collaborative tasks with a UR5e cobot.

  • Achieved performance:
    • Precision: 96%
    • Recall: 90%
    • F-score: 93%
  • Participants rated the engagement module with a satisfaction score of 5.65 (on a 7-point scale).

  • Real-time capable, with a total frame processing time of approximately 40 ms.

This work supports safe, contactless, multimodal human–robot collaboration suitable for multi-user industrial applications.


Fulltext Access

https://doi.org/10.1109/ACCESS.2022.3182469


Citing

```bibtex @ARTICLE{EngagementHRI2022, author={Abdelrahman, Ahmed A. and Strazdas, Dominykas and Khalifa, Aly and Hintz, Jan and Hempel, Thorsten and Al-Hamadi, Ayoub}, journal={IEEE Access}, title={Multimodal Engagement Prediction in Multiperson Human–Robot Interaction}, year={2022}, volume={10}, pages={61979–61991}, doi={10.1109/ACCESS.2022.3182469} }