Upper limb segmentation in egocentric vision is a challenging and nearly unexplored task that extends the well-known hand localization problem and can be crucial for a realistic representation of users' limbs in immersive and interactive environments, such as VR/MR applications designed for web browsers that are a general-purpose solution suitable for any device. Existing hand and arm segmentation approaches require a large amount of well-annotated data. Then different annotation techniques were designed, and several datasets were created. Such datasets are often limited to synthetic and semi-synthetic data that do not include the whole limb and differ significantly from real data, leading to poor performance in many realistic cases. To overcome the limitations of previous methods and the challenges inherent in both egocentric vision and segmentation, we trained several segmentation networks based on the state-of-the-art DeepLabv3+ model, collecting a large-scale comprehensive dataset. It consists of 46 thousand real-life and well-labeled RGB images with a great variety of skin colors, clothes, occlusions, and lighting conditions. In particular, we carefully selected the best data from existing datasets and added our EgoCam dataset, which includes new images with accurate labels. Finally, we extensively evaluated the trained networks in unconstrained real-world environments to find the best model configuration for this task, achieving promising and remarkable results in diverse scenarios. The code, the collected egocentric upper limb segmentation dataset, and a video demo of our work will be available on the project page1.

Solid and Effective Upper Limb Segmentation in Egocentric Vision

Gruosso M.;Capece N.;Erra U.
2021-01-01

Abstract

Upper limb segmentation in egocentric vision is a challenging and nearly unexplored task that extends the well-known hand localization problem and can be crucial for a realistic representation of users' limbs in immersive and interactive environments, such as VR/MR applications designed for web browsers that are a general-purpose solution suitable for any device. Existing hand and arm segmentation approaches require a large amount of well-annotated data. Then different annotation techniques were designed, and several datasets were created. Such datasets are often limited to synthetic and semi-synthetic data that do not include the whole limb and differ significantly from real data, leading to poor performance in many realistic cases. To overcome the limitations of previous methods and the challenges inherent in both egocentric vision and segmentation, we trained several segmentation networks based on the state-of-the-art DeepLabv3+ model, collecting a large-scale comprehensive dataset. It consists of 46 thousand real-life and well-labeled RGB images with a great variety of skin colors, clothes, occlusions, and lighting conditions. In particular, we carefully selected the best data from existing datasets and added our EgoCam dataset, which includes new images with accurate labels. Finally, we extensively evaluated the trained networks in unconstrained real-world environments to find the best model configuration for this task, achieving promising and remarkable results in diverse scenarios. The code, the collected egocentric upper limb segmentation dataset, and a video demo of our work will be available on the project page1.
2021
9781450390958
File in questo prodotto:
File Dimensione Formato  
3485444.3495179.pdf

accesso aperto

Tipologia: Documento in Post-print
Licenza: Dominio pubblico
Dimensione 21.14 MB
Formato Adobe PDF
21.14 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11563/152725
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? ND
social impact