Exploring Upper Limb Segmentation with Deep Learning for Augmented Virtuality

Gruosso, M.; Capece, N.; Erra, U.

doi:10.2312/stag.20211483

Sense of presence, immersion, and body ownership are among the main challenges concerning Virtual Reality (VR) and freehand-based interaction methods. Through specific hand tracking devices, freehand-based methods can allow users to use their hands for VE interaction. To visualize and make easy the freehand methods, recent approaches take advantage of 3D meshes to represent the user's hands in VE. However, this can reduce user immersion due to their unnatural correspondence with the real hands. We propose an augmented virtuality (AV) pipeline allows users to visualize their limbs in VE to overcome this limit. In particular, they were captured by a single monocular RGB camera placed in an egocentric perspective, segmented using a deep convolutional neural network (CNN), and streamed in the VE. In addition, hands were tracked through a Leap Motion controller to allow user interaction. We introduced two case studies as a preliminary investigation for this approach. Finally, both quantitative and qualitative evaluations of the CNN results were provided and highlighted the effectiveness of the proposed CNN achieving remarkable results in several real-life unconstrained scenarios.

Exploring Upper Limb Segmentation with Deep Learning for Augmented Virtuality

Gruosso M.;Capece N.;Erra U.

2021-01-01

Abstract

Sense of presence, immersion, and body ownership are among the main challenges concerning Virtual Reality (VR) and freehand-based interaction methods. Through specific hand tracking devices, freehand-based methods can allow users to use their hands for VE interaction. To visualize and make easy the freehand methods, recent approaches take advantage of 3D meshes to represent the user's hands in VE. However, this can reduce user immersion due to their unnatural correspondence with the real hands. We propose an augmented virtuality (AV) pipeline allows users to visualize their limbs in VE to overcome this limit. In particular, they were captured by a single monocular RGB camera placed in an egocentric perspective, segmented using a deep convolutional neural network (CNN), and streamed in the VE. In addition, hands were tracked through a Leap Motion controller to allow user interaction. We introduced two case studies as a preliminary investigation for this approach. Finally, both quantitative and qualitative evaluations of the CNN results were provided and highlighted the effectiveness of the proposed CNN achieving remarkable results in several real-life unconstrained scenarios.