Home

Awesome

<h1 align="center">Vision-ml</h1>

Build Status GitHub GitHub

See also Vision-ui a series algorithms for mobile UI testing.

A R-CNN (Region-based Convolutional Neural Networks) machine learning model for handling pop-up window in mobile apps.

Mobile UI Recognition

Vision-ml is a machine learning model that identifies the UI element that closes the Pop-up window and return its UI coordinate (x, y) on the screen.

A typical usage scenario would be:

123

Requirements

Python3.6.x

# create venv before install requirements
pip install  -r requirements.txt

Usage

You can use Vision with a pre-trained model in "model/trained_model_1.h5", the number in the file name is for version control, you can update it in file named "all_config".

There are two ways of using Vision.

Predict an image with Python script

  1. Update your file path in "rcnn_predict.py"
model_predict("path/to/image.png", view=True)
  1. Run script and you will get the result
python rcnn_predict.py

Predict an image with a web server

  1. Start the web server

You can create server with Dockerfile

python vision_server.py
  1. Post image to web server
curl  http://localhost:9092/client/vision  -F "file=@${IMAGE_PATH}.png"
  1. The response from the web server will have the coordinate or the UI element, alone with a value of score 0 or 1.0 (0 means not found, 1.0 means found).
{
  "code": 0,
  "data": {
    "position": [
      618,
      1763
    ],
    "score": 1.0
  }
}

Train your own model

Button image named 1_1.png:

Background image named 0_3.png:

  1. There are some images in this repo for training.
0_0.png 0_1.png 0_2.png 0_3.png 0_4.png 0_5.png 0_6.png 1_0.png 1_1.png 1_2.png 1_3.png 1_4.png 1_5.png 1_6.png
  1. Get augmentation of your image, run method in "rcnn_train.py"
Image().get_augmentation()
  1. Train your image, run method in "rcnn_train.py"
train_model()

Model layers and params

Model layers

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_1 (Conv2D)            (None, 48, 48, 32)        320       
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 46, 46, 32)        9248      
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 23, 23, 32)        0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 21, 21, 64)        18496     
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 10, 10, 64)        0         
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 8, 8, 64)          36928     
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 4, 4, 64)          0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 4, 4, 64)          0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 1024)              0         
_________________________________________________________________
dense_1 (Dense)              (None, 128)               131200    
_________________________________________________________________
dense_2 (Dense)              (None, 2)                 258       
=================================================================

Total params: 196,450
Trainable params: 196,450
Non-trainable params: 0

Training params

In all_config.py we have training params of batch_size and epochs

Performance

With CPU of corei7@2.2Ghz:

Reference

The R-CNN model refers to this paper.