The datasets consist of multi-object scenes. Each image or video is accompanied by ground-truth segmentation masks for all objects in the scene. For some datasets (excluding Objects Room and CATER), ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results