This document shows the ground truths for the algorithm described in paper entitled “Toward Temporal Action Segmentation in Uncut Videos Using Unsupervised Classification” for the following two datasets:

1. CVPR 2012 Change Detection dataset (Thermal set)

2. MuHAVi-uncut

start” shows the starting frame for every segment of the specific video and the “end” shows the last frame of each segment.


CVPR 2012 Change Detection dataset (Thermal set)

V1: corridor (5 segments)

start=[000570,002330,002972,004557,004904];

end=[002068,002623,003604,004818,005067];

V2: diningRoom (1 segment)

start=[000727];

end=[003413];

V3: lakeside (3 segments)

start=[001057,004740,005667];

end=[004689,005544,006500];

V4: library (1 segment)

start=[000862];

end=[004483];


V5: park (2 segments)

start=[000250,000452];

end=[000411,000577];



MuHAVi-uncut dataset

C1: Camera 1 start=4+[76430,82665,88701,99711,107156,114596,157500,180210,189315,202038,209906,214946,221096,236172,245876,253836,269696];

end=4+[82665,88701,99711,107156,114596,122911,168075,189315,200105,209906,214946,221096,231296,245876,253836,260146,274641];


C2: Camera 2 (is available in two parts that’s why A and B)

startA=[182,6417,12453,23463,30908,38348,59938];

startB=[415,9520,22242,30110,35150,41300,56376,66080,74040,81100];

endA=[6417,12453,23463,30908,38348,46663,70513];

endB=[9520,20310,30110,35150,41300,51500,66080,74040,80350,86045];


C3: Camera 3 (is available in two parts that’s why A and B)

startA=[17 6252 12288 23298 30743 38183 59587];

startB=[[[415,9520,22242,30110,35150,41300,56376,66080,74040]-115],81067];

endA=[6252 12288 23298 30743 38183 46498 70162];

endB=[[[9520,20310,30110,35150,41300,51500,66080,74040,80350]-115],86012];


C4: Camera 4

start=[76431,82666,88702,99712,107157,114597,157500,180211,189316,202040,209908,214948,211098,236174,245878,253838,269698]; end=[82666,88702,99712,107157,114597,122912,168075,189316,200106,209908,214948,221098,231298,245878,253838,260148,274643];


C5: Camera 5

start=[76430,82665,88701,99711,107156,114596,157500,180210,189315,202038,209906,214946,221096,236172,245876,253836,269696];

end=[82665,88701,99711,107156,114596,122911,168075,189315,200105,209906,214946,221096,231296,245876,253836,260146,274641];


C6: Camera 6 (is available in two parts that’s why A and B)

startA=[124+[182,6417,12453,23463,30908,38348],59938]; startB=[316+[415,9520,22242,30110,35150,41300,56376,66080,74040],82305];

endA=[124+[6417,12453,23463,30908,38348,46663],70796]; endB=[316+[9520,20310,30110,35150,41300,51500,66080,74040,80350],87250];


C7: Camera 7

start=[76430,82665,88701,99711,107156,114596,157500,180210,189315,202038,209906,214946,221096,236172,245876,253836,269696]; end=[82665,88701,99711,107156,114596,122911,168075,189315,200105,209906,214946,221096,231296,245876,253836,260146,274641];


C8: Camera 8

start=[1223,7458,13494,24504,31949,39389];

end=[7458,13494,24504,31949,39389,47704];


(c) Fiza Murtaza, 2016