UNITN Social Interaction(USI) Dataset Descriptions:

The USI Dataset consists of 4 types of two-person interactions: Talking, Shaking, Hugging and Fighting. Each type of two-person interaction has 16 samples, with the total number of 16x4 = 64 samples.

All the videos are taken outdoors in '.avi' format, with the resolution of 320*240, frame rate 30fps.

The detections of STIPs (Spatial-Temporal Interest Points) are also provided as annotations. The positions, spatial-temporal scales, histograms of gradient(HOG) and histograms of optical flow(HOF) of all the interest points in each video are listed in a text file.

Video sequences and annotations are provided in .zip file:
Version 1.0 - Videos and Annotations
Db Paper "Real Time Detection of Social Interactions in Surveillance Video" of ECCV 2012 Workshop - Videos and Annotations
Db Paper "Exploiting visual search theory to infer social interactions" of SPIE/Electronic Imaging 2013 - Videos and Annotations
Please cite this dataset as:

