Our method enables high-precision 6D pose tracking and binary gripper state estimation, supports multiple robot arms, and improves data collection efficiency with near-human user comfort.
We evaluate on 3 manipulation tasks. Our method with 100 demonstrations outperforms the teleoperation baseline with 50 demonstrations in success rate. These results show our approach enables scalable, cost-effective data collection for real-world robotic skill learning.