Home

Awesome

Lightning Hydra Template Vertex AI

This repository offers the code that makes the template run on Vertex AI Custom Job and Hyperparameter Tuning Job.

We also published sample code for the Vertex AI Pipeline and Hydra in here.

🎉Fortunately, this repository was introduced as a useful repositorie from the original repository.

main_theme

the Japanese version of the README is here

<br>

💡 Reason for opening the repository to the public

PyTorch Lightning and Hydra, which is a learning framework and hyperparameter management package, can provide various benefits, such as parallel learning with only a few lines of code changes. The excellent train template code created based on these two packages is available for public use. For more information on PyTorch Lightning and Hydra, see README of the template code.

Vertex AI is an integrated machine learning platform on Google Cloud Platform, and by using Vertex AI, the following can be easily executed.

where (★) can be implemented in this repository.

(*) : We are happy to open our sample code for Hydra × Vertex AI Pipeline in here

However, Vertex AI and Hydra are incompatible because of the different way of passing command line arguments. In order to run code written in Hydra on Vertex AI, we need to devise a way to run it. In this repository, we have provided the code that has been devised so that you can learn with Vertex AI without difficulty.

For more information on the problem and solution, please see this blog. <br>

🚀 How to use this repository

step 1. Edit the template code and create your own train code (Optional).

If we want to create our own AI, you have to edit the template code and make sure the training is complete.

If you just want to check how it works with Vertex AI, you can run the template code without editing, and it will train the MNIST classification.

<br>

step 2. confirm that training is executable with Docker Image.

Vertex AI uses Docker Image for training, so it is necessary to confirm the training on Docker Image. At that time, you can confirm that by typing below in root directory.

make train-in-docker

Option such as checking operation on GPU can be adjusted in docker-compose.yaml.

<br>

step 3. Prepare a GCP account.

If you do not have a GCP account, please prepare a GCP account from here. This repository uses Vertex AI and Artifact Registry. Please activate the respective APIs in GCP.

Next, create a docker repository) to push Docker Images to the Artifact Registry.

Then determine the name of the Image.

<br>

step 4-1. Run a custom job

make create-custom-job

in the root folder. Docker build and push will be performed, and the custom job of Vertex AI will be started with the pushed image. You can check the training status at CUSTOM JOBS in the Vertex AI training section of GCP.

<br>

step 4-2. Run a hyperparameter tuning job

make create-hparams-tuning-job

in the root folder. Docker build and push will be performed, and the hyperparameter tuning job of Vertex AI will be started with the pushed image.

You can check the training status at HYPERPARAMETER TUNING JOBS in the Vertex AI training section of GCP.

<br>

🔧 Changes

The following changes have been made in this repository from train template code.

📝 Appendix

JX PRESS Corporation has created and use the training template code in order to enhance team development capability and development speed.

We have created this repository by transferring only the code for training with Vertex AI from JX's training template code to Lightning-Hydra-Template.

For more information on JX's training template code, see How we at JX PRESS Corporation devise for team development of R&D that tends to become a genus and PyTorch Lightning explained by a heavy user. (Now these blogs are written in Japanese. If you want to see, please translate it into your language. We would like to translate it in English and publish it someday) <br>

😍 Main contributors

The transfer to this repository was done by Yongtae, but the development was conceived and proposed by Yongtae and near129 led the code development.

<br>

🔍 What we want to improve