EgoMAN: Flowing from Reasoning to Motion

Our work addresses 3D hand trajectory prediction in egocentric interaction, where future hand motion is inferred from visual observations, past motion, spatial context, and intent. Real-world actions follow stage-aware interaction structures (e.g., approach, manipulate) describing how the hand interacts with objects over time. However, prior works typically treat trajectory prediction as continuous signal regression, decoupling motion from semantic supervision and ignoring interaction structure. Without stage-aware cues to infer intent, models struggle to separate purposeful motion from egocentric noise and generalize across diverse interactions. We therefore present EgoMAN, a unified framework for interaction-structured 3D hand trajectory prediction that models hand motion as stage-aware interactions between the hand and surrounding objects. EgoMAN introduces a novel Trajectory-Token Interface where a small set of tokens encodes interaction stages, temporal progression, and 6DoF pose, enabling interaction stage-aware reasoning to guide efficient long-horizon 3D trajectory generation while preserving physical interpretability. To support this formulation, we construct the EgoMAN dataset with 219K 6DoF trajectories, stage-aware annotations, and 3M semantic, spatial, and motion QA pairs. Experiments show that EgoMAN improves trajectory accuracy, smoothness, and generalization, enabling interaction-structured reasoning for egocentric hand motion prediction for applications in robotics and assistive systems.

EgoMAN: Interaction-Structured Reasoning for Egocentric 3D Hand Trajectory Prediction

ECCV 2026

Abstract

Trajectory Forecasting on EgoMAN Unseen
(Dynamic Ego-Video Overlay)

Zero-Shot Eval on HOT3D Out-Of-Domain
(Dynamic Ego-Video Overlay)

Trajectory Prediction with Diverse Intention Text

BibTeX