TRECVID Event Detection evaluation is sponsored by the National Institute of Standards and Technology, which is to evaluate the state-of-the-art event detection approaches on the most realistic airport surveillance video data (up to 100 hours) . Please refer to TRECVID website and Event Detection evaluation page for more details.
Challenges of TRECVID event detection evaluation
Our team only focus on detecting events that comprise one or two individuals rather than large groups of people. We only develop systems for the detection of "Person running", "Embrace", and "pointing" events. We have achieved top results on the events of running and embrace, with very top minimum DCR scores.
The DET curves for the 2009 submissions. (Note: the more the DET curve closes to the left-bottom corner, the better the performance. Refers to TRECVID website for details)
Please click the image to see the image with higher resolution.
![]() |
![]() |
![]() |
|||||||||||
Person running |
Embrace |
Pointing |
Examples of Running detection Results
Our system is based on space-time window-scanning. Exhaustively searching all possible space-time windows in a video is time-consuming. We employ background subtraction to reduce the searching space and discard regions which do not cover enough foreground. To further reduce the searching space, we use photogrammetric context information to roughly estimate the human height on images and determine where events are likely to occur. Please refers to our report for more details. Our final system is very efficient. It takes only 0.2s to scan one frame.
Examples of the ground subtraction results and the human height estimation results. In the last video, the red bounding boxes are the final searching time-space windows after the pre-processing step.
One video |
Background subtraction |
Final searching time-space windows |