BOSS dataset

Purpose of annotations

The aim of the annotations is to have a ground-truth file of the BOSS dataset with the following requirements:

It must point out the actions and interactions occurring in the videos by indicating their starting and ending frames
It must have enough labels to have a wide range of actions and interactions
It mustn't have too many labels for each action and interaction is represented enough times in order to allow the training of a classifier
The file must be easy to understand

The file is available here: AnnotationsBOSS_v1.xlsx

Explanations

The labels chosen for actions are: walking, sitting, standing, laying down.
The labels chosen for interactions are: fight, handshake, kissing on the cheeks, hugging, helping.

As it can be seen on the videos, many others actions and interactions occur such as kneeling, using a cellphone, harassing, reading a newspaper or dancing. However, those events are not happening enough times or are too difficult to precisely point out to be relevant for the purpose of the annotations file.

The has got two worksheets, one for actions and the other for the interactions.
The actions work sheet is classified by cameras. In each camera section, there is one array for each video.
In those arrays, the first column indicates the actions happening in the video while the first row indicates the persons in the video. The persons are numbered according to their order of entrance. A person is labelled as entering when half of their body is visible, it's labelled as leaving when there is less than half of there body is visible.

When an action is detected, its starting frame number is indicated in the "start x" column in the corresponding row. When this action ends, its ending frame number is indicated in the "end x" column in the corresponding row. As soon as a new action occurs, its starting and ending frames numbers are annotated in the same columns as the previous action if the new action is placed under the previous one. If not, it's placed in a new pair of columns.
This way, the flow of actions for a person must be read from top to bottom then left to right, as suggested by the red arrows on the following picture.

For instance, on the example above, person 1 does action 1 then 2, then 4,then 3, and at last 5. Meanwhile, the person 2 does action 1 then 3 two times before doing action 1 one last time.

For interactions, things are simpler.
As for actions, interactions are indicated on the first row. For each interaction the number of people involved is indicated, who they are, and the starting and ending frames as well as actions.
This way, interactions are indicated in the same fashion as actions.