Home

Awesome

quick,draw! prediction model

this is my repository for the quick draw prediction model project
last updated: 11/20/2017

Repo Instructions

python folder:

Procedure.ipynb

images

Introduction:

Google Quickdraw released dataset that contains over 50 million drawings on 5/18/2017.
the google quickdraw is an online pictionary game application where...

  1. user is asked to draw a picture of certain category in 20 seconds

  2. While user draws a picture, google AI will try to predict what user is drawing

  3. If google AI predicts what user is drawing, user wins!

  4. repeat 1-3 6 times.
    (Of course there is no prize for winning this game but it is super addicting!)

With this dataset, I wanted to answer following 2 questions:
1. Can machine learning models distinguish similar drawings?
2. Can machine learning models identify users' country based on their drawings?

To answer these questions, I prepared 2 prediction models

  1. XGBoost ensemble method model
  2. Convolusional Neural Network model

Results

I made 4-way classifier prediction models for both image recognition and country prediction (Total of 4 models).

model Accuracy

image recognitionCountry prediction
CNN model90.2%62.7%
XGBoost model79.1%43.8%

Results of image recognition

Example1: Cat Drawing1
example1 prediction_for_example1

Example2: Cat Drawing2
example2 prediction_for_example2

For image recognition, both CNN and XGBoost models had high prediction accuracy.
Since CNN model looks into pixels and XGBoost model looks into features that I calculated,   features are engineered differently for each model (meaning models analyze images completely differently).
Therefore, they make quite different predictions. For instance, check out Example2 above.

Results of Country prediction

Example1: Dog Drawing from Brasil
example3 prediction_for_example3

For Country prediction, models had lower accuracy than ones from image recognition.
The important features from XGboost model indicates that users' country can be identified based on

  1. amount of information (details) exist within an image
  2. how fast/slow did users draw their images
  3. direction of first few strokes
  4. X,Y ratio of images

Data used:

the dataset that google released contains images and several features related to image.
Features include drawing_ID, category(what quickdraw asked to draw), timestamp, whether AI guessed correct or not, user's country and drawing.
drawing is represented as a list of list of list.
The drawing feature is a list of strokes and stroke is a list of X,Y and time (3 lists within a stroke)

the stroke information contains 2 additional dimensions:

typical imageQuickdraw data
3D (X,Y,color )4D(X,Y,time,stroke)
a drawinghow user drew a drawing

from this input dataset, I collected image data of CAT, TIGER, LION, DOG for image recognition part of my project.

for country preiction part of my project I selected 4 countries: United States, BRASIL, RUSSIA and SOUTH KOREA.

I used these 4 countries because these 4 countries had good number of images and they also do not share same alphabet/language.
My initial guess was that the way people draw is closely related to how people write.


other info:

Image recognition:

Country prediction:

MODELS

Filters applied to both models

all drawing used in training

  1. were recognized by Google AI correctly
  2. contains 15 or less strokes
  3. has final time that is 20000ms or less
  4. has X and Y ratio where range of Y / range of X =< 1.5

label:
image recognition:
[cat,tiger,lion,dog]

country prediction:
[US, BR, RU, KR]


1. XGBOOST

Ran codes that creates 399 new features. Features include:

image recognition model:
(max_depth=1, n_estimators=5000, learning_rate=0.25)
Highest accuracy (6/27/2017): 79.1222222222 percent

country prediction model:
(max_depth=1, n_estimators=1000, learning_rate=0.2)
Highest accuracy (6/27/2017): 43.7979539642 percent


2. Convolusion Neural Network Model

the code I have for CNN applies filtering above and reformat each image into 42 pixel(Y) by 28 pixel(X) format.
After this process, my CNN data has 1176 columns per image.

CNN structure

Keras parameters and codes:

from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Convolution2D, MaxPooling2D
from keras.layers.convolutional import ZeroPadding2D
from keras.utils import np_utils
from keras.models import load_model

model = Sequential()
model.add(Convolution2D(64, 5, 5, activation='relu', input_shape=(42,28,1)))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Flatten())
model.add(Dense(100, activation='relu'))
model.add(Dropout(.20))
model.add(Dense(4, activation='softmax'))

model.compile(loss='mean_squared_error', optimizer='adam',metrics=['accuracy'])

model.fit(X_train, y_train, batch_size=128, nb_epoch=30, verbose=1,validation_split=0.2)

If you have any suggestion or have better CNN model parameters/code for google quickdraw data, let me know!

image recognition model:
(batch_size = 128, epoch = 20)
Highest accuracy (6/27/2017): 90.21666666666667 percent

country prediction model:
(batch_size = 128, epoch = 30)
Highest accuracy (6/27/2017): 62.7050053121 percent


Findings:

From XGBoost model's feature importance attributes, found some interesting results about image recognition and country prediction.

Image recognition:

The model distinguished images based on how much datapoints exist in first 3 strokes.
In other words, the model looked for amount of details that exist within first 3 strokes.
Also 4 types of images were distinguishable based on the starting point of drawing and X:Y ratio of image.
It looked on direction (slope and direction) of stroke. Somehow, direction of stroke 6 was important when distinguishing cat, tiger, lion and dog drawings.

XGBoost model's top10 most important features for image recognition:

  1. Ymax

  2. datapoint_percentage_stroke1

  3. datapoint_percentage_stroke2

  4. X_0

  5. direction_stroke6

  6. datapoint_percentage_stroke0

  7. direction_stroke1

  8. direction_stroke2

  9. total_time_drawing

  10. Y_0

Country prediction:

In order to distinguish user's country, my XGBoost model looked on certain characteristics of images.

  1. amount of information (details) exist within an image
  2. how fast/slow did users draw their images
  3. direction of first few strokes
  4. X,Y ratio of images

number 3 brings up interesting point since Quartz.com had an article on quickdraw with similar data analysis result.
Both article and my results showed that diffrent culture/country tends to draw certain shape/objects differently due to their method of writing.

XGBoost model's top10 most important features for country prediction:

  1. total_number_of_datapoints
  2. time_stroke0
  3. direction_stroke2
  4. X_0
  5. time_1
  6. ave_datapoints_per_stroke
  7. direction_stroke0
  8. direction_stroke3
  9. final_time
  10. Ymax

all features on this list had above 1% feature importance

What's next?

Other:

project presentation video DSI capstone project showcase Galvanize Austin 6/22/2017
project presentation video DSI capstone project showcase Galvanize Austin 6/22/2017

Resources:

  1. Quick,Draw! The Data

  2. How do you draw a circle? We analyzed 100,000 drawings to show how culture shapes our instincts