Occluded Video Instance Segmentation (OVIS)

A workshop and challenge focusing on occluded object understanding in video

October 17th, 8:00 am EDT, ICCV 2021 Virtual Workshop

ICCV 2021 Workshop

Invited Speakers

Workshop Schedule

October 17th, 8:00 - 11:00 am EDT

Time (EDT) Speaker Topic
8:00-8:10 am EDT Organizers Welcome & introduction
8:10-8:40 am EDT Alan Yuille Invited talks topic 1
8:40-8:50 am EDT OVIS 3rd place team A Single-Stage, Bottom-up Approach for Occluded VIS using Spatio-temporal Embeddings
8:50-9:00 am EDT OVIS 1st place team Limited Sampling Reference Frame for MaskTrack R-CNN
9:00-9:10 am EDT Khalid J. Almalki Oral section - Characterizing Scattered Occlusions for Effective Dense-Mode Crowd Counting
9:10-9:20 am EDT Heechul Bae Oral section - Occluded Video Instance Segmentation with Set Prediction Approach
9:20-9:30 am EDT Shane Gilroy Oral section - Pedestrian Occlusion Level Classification using Keypoint Detection and 2D Body Surface Area Estimation
9:30-10:00 am EDT Hanwang Zhang Invited talks topic 2

Challenge Track

In the visual world, objects rarely occur in isolation. The psychophysical and computational studies have demonstrated that human vision systems can perceive heavily occluded objects with contextual reasoning and association. The question then becomes, can our video understanding system perceive objects that are severely obscured? The OVIS competition will be hosted on an online platform and presentations will be delivered on Zoom.

OVIS is a new large scale benchmark dataset for video instance segmentation task. It is designed with the philosophy of perceiving object occlusions in videos, which could reveal the complexity and the diversity of real-world scenes. OVIS consists of:
  • 296k high-quality instance masks
  • 25 commonly seen semantic categories
  • 901 videos with severe object occlusions
  • 5,223 unique instances

We use average precision (AP) at different intersection-over-union (IoU) thresholds and average recall (AR) as our evaluation metrics, following Youtube-VIS. The IoU in video instance segmentation is the sum of intersection area over the sum of union area across the video.

Dataset Download
Evaluation Server
For more details about the dataset, please refer to our paper or website.
Competition Schedule
Competition Date
Competition Phase 1 (open the submission of the val results) June 1, 2021 (11:59PM Pacific Time)
Competition Phase 2 (open the submission of the test results) July 25, 2021 (11:59PM Pacific Time)
Deadline for Submitting the Final Predictions August 1, 2021 (11:59PM Pacific Time)
Decisions to Participants August 6, 2021 (11:59PM Pacific Time)
Top Teams
Rank Team Name Team Members Organization Technical Report
1st Ach Zhuang Li, Leilei Cao, Hongbin Wang Ant Group PDF
2nd huapohen Wenbo Li, Xuesheng Li, Qiwei Xu, Chen Li, Jiaxue Wang, Zongxiang Fu University of Electronic Science and Technology of China,
Chengdu DELU Dynamics Ltd
3rd Ali2500 Ali Athar1, Sabarinath Mahadevan1, Aljosa Osep2, Bastian Leibe1 1 RWTH Aachen University, 2 Carnegie Mellon University PDF
For quoting the results of this competition, please cite:
    title={Occluded Video Instance Segmentation: Dataset and ICCV 2021 Challenge},
    author={Jiyang Qi and Yan Gao and Yao Hu and Xinggang Wang and Xiaoyu Liu and Xiang Bai and Serge Belongie and Alan Yuille and Philip Torr and Song Bai},
    booktitle={Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
If the OVIS dataset helps your research, please cite:
    title={Occluded Video Instance Segmentation: A Benchmark},
    author={Jiyang Qi and Yan Gao and Yao Hu and Xinggang Wang and Xiaoyu Liu and Xiang Bai and Serge Belongie and Alan Yuille and Philip Torr and Song Bai},
    journal={International Journal of Computer Vision},

Paper Track

Call for Papers

Although deep learning methods have achieved advanced video object recognition performance in recent years, perceiving objects in heavy occlusion video scenes is still a very challenging task. The difficulty of precisely localizing and reasoning heavily occluded objects in videos reveals that current deep learning models perform differently with the human vision system, and confirms that it is urgent to design new paradigms for video understanding.

Topics include but not limited to:
  • Video understanding
  • Occluded video instance segmentation
  • Occluded object detection
  • Occlusion reasoning
  • Occlusion edge detection
  • Video object segmentation
  • Video object detection
  • Multi-object tracking
Submitted papers must follow the ICCV 2021 paper template and should be from 4 to 8 pages in length (excluding citations). The review process will be double-blind and submissions must be anonymized. All accepted papers will be published in IEEE ICCVW proceedings. Both challenge and regular papers are welcome. (You can submit your paper without participating in our challenge.) Please submit online via CMT Website.
Paper Track Schedule
Paper Date
Submission Deadline July 25, 2021 (11:59PM Pacific Time)
Author Notification August 9, 2021 (11:59PM Pacific Time) - Extended
Camera-ready Due August 16, 2021 (11:59PM Pacific Time) - Extended