Estimating touch contact and pressure in egocentric vision is a central task for downstream applications in Augmented Reality, Virtual Reality, as well as many robotic applications, because it provides precise physical insights into hand-object interaction and object manipulation. However, existing contact pressure datasets lack egocentric views and hand poses, which are essential for accurate estimation during in-situ operation, both for AR/VR interaction and robotic manipulation. In this paper, we introduce EgoPressure, a novel dataset of touch contact and pressure interaction from an egocentric perspective, complemented with hand pose meshes and fine-grained pressure intensities for each contact. The hand poses in our dataset are optimized using our proposed multi-view sequence-based method that processes footage from our capture rig of 8 accurately calibrated RGBD cameras. EgoPressure comprises 5.0 hours of touch contact and pressure interaction from 21 participants captured by a moving egocentric camera and 7 stationary Kinect cameras, which provided RGB images and depth maps at 30 Hz. In addition, we provide baselines for estimating pressure with different modalities, which will enable future developments and benchmarking on EgoPressure. Overall, we demonstrate that pressure and hand poses are complementary, which supports our intention to better facilitate the physical understanding of hand-object interactions in AR/VR and robotics research.
Example egocentric views from the EgoPressure dataset, the images are ovelayed with the pressure map and annotation of the hand skeleton. For each participant, we record 32 gestures for both hands.
The input for our annotation method consists of RGB-D images captured by 7 static Azure Kinect cameras and the pressure frame from a Sensel Morph touchpad. We leverage Segment-Anything and HaMeR to obtain initial hand poses and masks. We refine the initial hand pose and shape estimates through differentiable rasterization optimization across all static camera views. Using an additional virtual orthogonal camera placed below the touchpad, we reproject the captured pressure frame onto the hand mesh by optimizing the pressure as a texture feature of the corresponding UV map, while ensuring contact between the touchpad and all contact vertices.
We introduce a new baseline model PressureFormer, which estimates pressure as a UV map of the 3D hand mesh, enabling projection both as 3D pressure onto the hand surface and as 2D pressure onto the image space. PressureFormer uses HaMeR's hand vertices and image feature tokens to estimate the pressure distribution over the UV map. We employ a differentiable renderer to project the pressure back onto the image plane by texture-mapping it onto the predicted hand mesh.
We compare our PressureFormer with both PressureVision and our extended baseline model with HaMeR-estimated 2.5D joint positions. Additionally, we provide visualizations of the hand mesh estimated by HaMeR, alongside the 3D pressure distribution on the hand surface derived from our predicted UV-pressure in the last two columns. Note that we transform the left-hand UV maps into the right-hand format.
We show demo captured by Meta Quest 3, the pressure is estimated by our PressureFormer.
@misc{EgoPressure,
Author = {Yiming Zhao and Taein Kwon and Paul Streli and Marc Pollefeys and Christian Holz},
Title = {EgoPressure: A Dataset for Hand Pressure and Pose Estimation in Egocentric Vision},
Year = {2024},
Eprint = {arXiv:2409.02224},
}