Awesome
TDW Image Dataset
Generate a dataset of synthetic images using TDW. By default, datasets have 1300000 "train" images and 50000 "val" images. A full dataset requires approximately 8 hours to generate using high-end hardware.
Requirements
Install
git clone https://github.com/alters-mit/tdw_image_dataset
cd tdw_image_dataset
pip3 install -e .
python3 download_build.py
(This will download the correct version of the build)
Upgrade
cd tdw_image_dataset
git pull
python3 download_build.py
Changelog
Usage
See single_env.py
, multi_env.py
, and test.py
. You may want to run test.py
first to make sure that you can generate image datasets.
How It Works
The ImageDataset
class will search the records databases for all model categories in the TDW library. It will then add the object to the scene, and then generate a target number of images per category, using all models in that category. Each model is added in sequentially to the scene; there is always exactly 1 model in the scene.
To increase variability, each image has randomized camera and positional parameters, and may have additional random parameters, such as the angle of sunlight or the visual materials of the model. This randomness is constrained somewhat in order to guarantee a degree of compositional quality (namely, that the object is guaranteed to always be at least partially in the frame).
1. Generate metadata
Every dataset has an associated metadata.txt
file, which contains a serialized JSON object of all of the parameters used to initialize this dataset. This can be very useful if you are generating many datasets with slightly different parameters.
2. Initialize the scene
Each dataset uses exactly 1 scene (multi_env.py
sidesteps this limitation by running 6 datasets sequentially). The scene's global parameters and post-processing parameters are initialized.
Each scene has one more more "environments", which are spatial boxes in which you expect images to look reasonable. It is possible in TDW to instantiate objects and avatars beyond these limits, but they will be in a blank void. In ImageDataset
, the avatar and object positions are always constrained to the scene's environments; in interior scenes these are rooms and in exterior scenes there is usually only one environment.initialize_scene
returns a list of these environments.
3. Fetch records
The controller fetches a list of all model categories ("wnids") in the model library.
4. Iterate through each wnid
The controller fetches a list of all records in the wnid. If the model has been "processed" (that is if all images for this model have already been generated), the model is skipped over.
5. Iterate through each model
5a. Set the starting index
Images are always saved as <filename>_<index>.jpg
. If the no_overwrite
in the constructor is set to False
, the starting index is always 0000
. Otherwise, the starting index will be the number after the last index (if any). This is mostly useful for cases like multi_env.py
in which you don't want sequential datasets to overwrite each other's images. If you're using only one scene, you probably want images to be overwritten to avoid generating extras if you have to restart the controller.
5b. Add the object and set the scene for grayscale capture
If you see the window become tiny, this is expected behavior!
To generate images, ImageDataset
runs each model through two loops. The first loop captures camera and object positions, rotations, etc. Then, these cached positions are played back in the second loop to generate images. Image capture is divided this way because the first loop will "reject" a lot of images with poor composition; this rejection system doesn't require image data, and so sending image data would slow down the entire controller.
Each object, once instantiated, is set to "unit scale", with its longest extent being set to 1 meter. This way, ImageDataset
can reliably frame every object using the same positional and rotational parameters.
5c. Positional Loop
Gather valid ImagePosition
data until the list of ImagePosition
objects equals the target number of images to capture.
ImageDataset
relies on Occlusion
data to determine whether an image has good composition. This data reduces the rendered frame of two _mask
passes. One _mask
pass includes environment objects (such as walls and trees) and the other doesn't. Both images are reduced to a single pixel. The returned value is the difference between the grayscale values of each pixel. This isn't much information but it's enough for this use-case and it's very fast. It also doesn't need a large window size to be useful; in fact, it runs faster if the window is smaller. So, to start the positional loop, the entire window is resized to 32x32 and render quality is set to minimal.
If an object is too occluded, the ImagePosition
is rejected.
5d. Image Loop
Once ImagePosition
has enough cached ImagePosition
data, it can begin to actually generate images. Image quality is now set to maximum, and the screen size is set to the desired image capture size (by default, 256x256).
Every iteration, the object and avatar are positioned and rotated according to the cached ImagePosition
data. Image data is received and written to disk. This image saving is handled via threading to prevent the controller from slowing down.
Optional Additional Commands
- If the
materials
parameter of the constructor is set toTrue
: Per frame, all of the object's visual materials will be randomly set to materials from the material library. - If the
hdri
parameter of the constructor is set toTrue
: Periodically set a new HDRI skybox. Per frame, set a random rotation for the HDRI skybox.
5e. Cleanup
Destroy the model and unload its asset bundle from memory.
6. Create a .zip file
After generating the whole dataset, ImageDataset
will zip the dataset directory and destroy the original files. If you don't want the controller to do this, set do_zip
to False
in the constructor.
Known Limitations
ImageDataset
can't include physics simulations. If it allowed objects to "fall" or otherwise move before image capture, the positional loop wouldn't work at all (because the object would immediately fall out of frame during the "optimal" pass). For physics datasets, see tdw_physics.ImageDataset
only works if there's one model in the scene. All of the image composition logic assumes that there is only one object to rotate, position, frame, etc. Additionally, images are semantically tagged assuming that there's only one object in the image.