EgoPressure


A Dataset for Hand Pressure and Pose Estimation in Egocentric Vision

Under review

Yiming Zhao1     Taein Kwon1     Paul Streli1      Marc Pollefeys1,2     Christian Holz1
1ETH Zurich     2Microsoft    



Abstract

Estimating touch contact and pressure in egocentric vision is a central task for downstream applications in Augmented Reality, Virtual Reality, as well as many robotic applications, because it provides precise physical insights into hand-object interaction and object manipulation. However, existing contact pressure datasets lack egocentric views and hand poses, which are essential for accurate estimation during in-situ operation, both for AR/VR interaction and robotic manipulation. In this paper, we introduce EgoPressure, a novel dataset of touch contact and pressure interaction from an egocentric perspective, complemented with hand pose meshes and fine-grained pressure intensities for each contact. The hand poses in our dataset are optimized using our proposed multi-view sequence-based method that processes footage from our capture rig of 8 accurately calibrated RGBD cameras. EgoPressure comprises 5.0 hours of touch contact and pressure interaction from 21 participants captured by a moving egocentric camera and 7 stationary Kinect cameras, which provided RGB images and depth maps at 30 Hz. In addition, we provide baselines for estimating pressure with different modalities, which will enable future developments and benchmarking on EgoPressure. Overall, we demonstrate that pressure and hand poses are complementary, which supports our intention to better facilitate the physical understanding of hand-object interactions in AR/VR and robotics research.

EgoPressure Dataset

Example egocentric views from the EgoPressure dataset, the images are ovelayed with the pressure map and annotation of the hand skeleton. For each participant, we record 32 gestures for both hands.

Visualization

View Ego.
View 1
View 2
View 3
View 4
View 5
View 6
View 7

Reconstruction

Annotation Pipeline

The input for our annotation method consists of RGB-D images captured by 7 static Azure Kinect cameras and the pressure frame from a Sensel Morph touchpad. We leverage Segment-Anything and HaMeR to obtain initial hand poses and masks. We refine the initial hand pose and shape estimates through differentiable rasterization optimization across all static camera views. Using an additional virtual orthogonal camera placed below the touchpad, we reproject the captured pressure frame onto the hand mesh by optimizing the pressure as a texture feature of the corresponding UV map, while ensuring contact between the touchpad and all contact vertices.

EgoPressure Capture System

Cameras Setup

Head Pose Tracking

Synchronization

PressureFormer

Pipeline

We introduce a new baseline model PressureFormer, which estimates pressure as a UV map of the 3D hand mesh, enabling projection both as 3D pressure onto the hand surface and as 2D pressure onto the image space. PressureFormer uses HaMeR's hand vertices and image feature tokens to estimate the pressure distribution over the UV map. We employ a differentiable renderer to project the pressure back onto the image plane by texture-mapping it onto the predicted hand mesh.

Qualitative Results on EgoPressure Dataset

We compare our PressureFormer with both PressureVision and our extended baseline model with HaMeR-estimated 2.5D joint positions. Additionally, we provide visualizations of the hand mesh estimated by HaMeR, alongside the 3D pressure distribution on the hand surface derived from our predicted UV-pressure in the last two columns. Note that we transform the left-hand UV maps into the right-hand format.

Demo on Egocentric Video

We show demo captured by Meta Quest 3, the pressure is estimated by our PressureFormer.

BibTeX

@misc{EgoPressure,
      Author = {Yiming Zhao and Taein Kwon and Paul Streli and Marc Pollefeys and Christian Holz},
      Title = {EgoPressure: A Dataset for Hand Pressure and Pose Estimation in Egocentric Vision},
      Year = {2024},
      Eprint = {arXiv:2409.02224},
      }