RoSA: Evaluation of Touch and Speech Input Modalities for On-Site HRI and Telerobotics

This study evaluates the RoSA framework in hybrid scenarios, comparing touch and speech input for both local and remote human–robot interaction.

• Dominykas Strazdas, Matthias Busch, Rijin Shaji, Ingo Siegert, Ayoub Al-Hamadi

Hybrid Interaction with RoSA 3: Touch and Speech in Local and Remote Collaboration

Building on earlier iterations, RoSA 3 expands the Robot System Assistant framework with a touchscreen interface alongside speech input, enabling a hybrid interaction model where the same modalities can be used both on-site and remotely.
Participants interacted with two robots—the industrial UR5e-based Rosa and the humanoid Ari—to compare precision-focused touch against natural, hands-free speech.

The study introduced a structured learning phase to familiarize users with the modalities, followed by free exploration in collaborative cube-stacking, spelling, and personalization tasks.

Key highlights:

  • Touch input excelled in precision and efficiency, achieving average task times of 27.8 s versus 77.2 s for speech in comparable tasks.
  • Speech input was preferred in 90% of general interactions during free exploration, valued for intuitive flow and engagement.
  • Hedonic–pragmatic analysis showed touch scoring higher in pragmatic quality (efficiency, clarity) and speech higher in hedonic quality (enjoyment, engagement).
  • RoSA 3 outperformed RoSA 1 & 2 in overall task completion time (3:46 min vs. 19:35 and 25:20) and achieved an average user experience score of 75.6% (SUS, UMUX, PSSUQ, ASQ).
  • Touch commands succeeded 81.6% of the time, while speech commands had a 67.2% success rate, with most speech errors occurring in complex cube manipulation.
  • The hybrid design ensured seamless transition between local and remote use, paving the way for flexible, long-duration telerobotic operations.

Evaluation:

  • 10 participants (balanced gender, aged 20–40) completed tasks on both robots in local and remote phases.
  • Touch was dominant in precision tasks (e.g., cube placement), while speech was dominant in general control.
  • Remote interviews emphasized the value of consistent interfaces, live feedback (audio/video or digital twins), and direct manufacturer screen access.

This study highlights the complementary roles of touch and speech in HRI, showing that hybrid modality systems can combine the accuracy of direct manipulation with the natural flow of verbal commands.


Fulltext Access

https://doi.org/10.3389/frobt.2025.1561188


Citing

@ARTICLE{RoSA3,
  author={Strazdas, Dominykas and Busch, Matthias and Shaji, Rijin and Siegert, Ingo and Al-Hamadi, Ayoub},
  title={Robot System Assistant (RoSA): evaluation of touch and speech input modalities for on-site HRI and telerobotics},
  journal={Frontiers in Robotics and AI},
  volume={12},
  year={2025},
  doi={10.3389/frobt.2025.1561188}
}