" The reason is that, if we predict the pose in world space: (different 3d pose ⊕ different camera view) = same 2d projection, then if given a same 2d pose, these will be an ambiguity for predicted 3d pose. " what's that means ?
In my thought, different camera view ,different 2d keypoints location in the image, refer to the same 3d pose in world space, that is pretty fine, is there any problem i miss?