Home

Awesome

COIN Dataset

COIN is the currently largest dataset for comprehensive instruction video analysis. It contains 11,827 videos of 180 different tasks (i.e., car polishing, make French fries) related to 12 domains (i.e., vehicle, dish). All videos are collected from YouTube and annotated with an efficient toolbox.

Authors and Contributors

<p> Yansong Tang<sup>*</sup>, Dajun Ding<sup>†</sup>, Yongming Rao<sup>*</sup>, Yu Zheng<sup>*</sup>, Danyang Zhang<sup>*</sup>, Lili Zhao<sup>†</sup>, Jiwen Lu<sup>*</sup>, Jie Zhou<sup>*</sup>, Yongxiang Lian<sup>*</sup>, Yao Li<sup>†</sup>, Jiali Sun<sup>†</sup>, Chang Liu<sup>†</sup>, Dongge You<sup>†</sup>, Zirun Yang<sup>†</sup>, Jiaojiao Ge<sup>†</sup>, Jiayun Wang<sup>*</sup> </p>

Contact: coin.dataset@gmail.com

License

You may use the codes and files for research only, including sharing and modifying the material. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.

Dataset and Annotation

Taxonomy

The COIN is organized in a hierarchical structure, which contains three levels: domain, task and step. The corresponding relationship can be found at taxonomy [link]. We provide the taxonomy file of COIN in csv format. Below, we show a small part of the texonomy stored in taxonomy.xlsx:

<table> <tr><th>domain_target_mapping </th><th>target_action_mapping</th></tr> <tr><td><table></table>
DomainsTargets
......
VehicleChangeCarTire
VehicleInstallLicensePlateFrame
......
GadgetsReplaceCDDriveWithSSD
</td><td>
Target IdTarget LabelAction IdAction Label
............
13ChangeCarTire259unscrew the screw
13ChangeCarTire260jack up the car
13ChangeCarTire261remove the tire
13ChangeCarTire262put on the tire
13ChangeCarTire263tighten the screws
............
</td></tr> </table>

We store the url of video and their annotation in JSON format, which can be accessed with the link [COIN](Project link page). The json file is similar to that of ActivityNet. Below, we show an example entry from the key field "database":

"LtRSn-ntcLY": {
			"duration": 131.0309,
			"class": "ReplaceCDDriveWithSSD",
			"video_url": "https://www.youtube.com/embed/LtRSn-ntcLY",
			"start": 56.640895694775196,
			"annotation": [
				{
					"id": "212",
					"segment": [
						60.0,
						69.0
					],
					"label": "take out the laptop CD drive"
				},
				{
					"id": "216",
					"segment": [
						71.0,
						82.0
					],
					"label": "insert the hard disk tray into the position of the CD drive"
				}
			],
			"subset": "training",
			"end": 85.714362947023,
			"recipe_type": 131
		}

From the entry, we can easily retrieve the Youtube ID, duration, ROI and procedure information of the video. The field "annotation" comprises of a list of all annotated procedures within the video. The field "class" and sub-field "id" correspond to "task" and "step" of the taxonomy respectively.

File Structure

The annotation information is saved in COIN.json.

Field NameTypeExampleDescription
databasestring-Key filed of the annotation file.
-stringLtRSn-ntcLYYoutube ID of the video.
durationfloat56.640895694775196Duration of the video in seconds.
classstringReplaceCDDriveWithSSDName of the task in the video.
video_urlstringhttps://www.youtube.com/embed/LtRSn-ntcLYUrl of the video.
startfloat56.640895694775196Start time of the ROI of the video.
endfloat85.714362947023End time of the ROI of the video.
subsetstringtraining or validationSubset of the video.
recipe_typeint131ID number of the task.
annotationstring-Annotation information of the video.
annotation:idint212ID number of the procedure.
annotation:labelstringtake out the laptop CD driveName of the procedure.
annotation:segmentlist of float (len=2)[60.0,69.0]Start and end time of the procedure.