Home

Awesome

object-states-action

Code for the paper Joint Discovery of Object States and Manipulation Actions, ICCV 2017

Created by Jean-Baptiste Alayrac at INRIA, Paris.

Introduction

The webpage for this project is available here. It contains link to the paper, and other material about the work. This code reproduces the results presented in Table 1 of the paper for our method (meaning row (f) for state and row (iv) for actions).

License

Our code is released under the MIT License (refer to the LICENSE file for details).

Cite

If you find this code useful in your research, please, consider citing our paper:

@InProceedings{Alayrac17objectstates, author = "Alayrac, Jean-Baptiste and Sivic, Josef and Laptev, Ivan and Lacoste-Julien, Simon", title = "Joint Discovery of Object States and Manipulation Actions", booktitle = "International Conference on Computer Vision (ICCV)", year = "2017" }

Contents

  1. Requirements
  2. Running the code
  3. Data usage

Requirements

To run the code, you need MATLAB installed. The code was tested on Ubuntu 14.04 LTS with MATLAB-2016b.

Running

  1. Clone this repo and go to the generated folder
git clone https://github.com/jalayrac/object-states-action.git
cd object-states-action
  1. Download and unpack the preprocessed features:
wget https://www.di.ens.fr/willow/research/objectstates/features_data.zip  
unzip features_data.zip
  1. Open MATLAB (edit the launch file to select the action you want among 'put_wheel' (default), 'withdraw_wheel', 'open_oyster', 'pour_coffee', 'close_fridge', 'open_fridge' or 'place_plant')
compile.m
launch.m

Data

You can download metadata and the raw images (17GB). Here are some instructions to parse the metadata. The raw data is organized as follows:

action_name -> action_name_clipid -> %06d.jpg

NB: this data corresponds to the cropped clips (see Sec. 5.2., Experimental setup paragraph of the paper), so that you can directly compare to the experiments reported in Table 1 of the paper. In particular, the numbers of tracklets and video chunk in these files should match the numbers of tracklets and video chunk of the features released above (the feature and data are actually aligned). If you need the non cropped clips for some reason (which have a bit more annotations), please send an email to the author.

The metadata consists of one mat file per action with the following fields:

Clip information (in that example we have 191 clips in total):

Object state information

In that example we have 4016 tracklets in total:

	0: False Positive Detections
	1: state_1
	2: state_2
	3: Ambiguous

NB: the time does not necessarily start at 0 but this is fine. One can recover the i-th entry by just doing:

mean(meta_data.FRAMES_state{i} / fps)

NB: If you want for example to recover the ground truth of states but regrouped by clip one can simply do:

state_GT_per_clip = mat2cell(meta_data.state_GT', meta_data.clips)

This is why meta_data.clips can be useful.

Action information

Each clip is decomposed over small chunk of 0.4s (10 frames at 25fps). Because some videos have an higher or a lower fps the number of frames per chunk may vary.

In our example, we have 8777 chunks in total. Because different videos can have different fps, we provide the id of frames for each chunk.

NB: if you want to recover the equivalent of meta_data.clips but for action, one can simply do:

[U, Ia, Ic] = unique(meta_data.vids_action)
clips_action = sum(hist(Ic, max(U))

It can then be used as before but for actions.