Awesome
podder-task
This is base repository for PoC (Proof of Concept) code. Boilerplate project for creating python task using the podder-pipeline.
How to implement your code
Source code directory
$ tree . -L 2
.
├── Dockerfile
├── README.md
├── api
│ ├── __init__.py
│ ├── grpc_server.py
│ ├── protos
│ └── task_api.py
├── app
│ ├── __init__.py
│ └── task.py # main task implementation
├── log.yml
├── main.py
├── requirements
│ ├── requirements.develop.txt
│ └── requirements.txt # add required packages here
├── run_codegen.py
├── scripts
│ ├── entrypoint.sh
│ └── pre-commit.sh # execute before committing your codes
├── shared
│ ├── data
│ └── tmp
└── tests
├── files
│ └── inputs.json # sample inputs.json
└── unit # add unit test here
How to implement a task class
Add your code to app/task.py
.
Implementation sample
Please check task sample here Sample
init: Initialize task instance
def __init__(self, context: Context) -> None:
self.logger.debug("Initiate task...")
super().__init__(context)
execute: Main process
def execute(self) -> None:
self.logger.debug("START processing...")
self.yourProcess(self.args.input_path)
self.logger.debug("Completed.")
set_arguments: Arguments
def set_arguments(self, parser) -> None:
parser.add_argument('--input_path', dest="input_path", help='set input path', default='.')
API
podder-task-base
python module provides many APIs for the development.
Logging
You can output logs with self.logger
. logger
is just a wrapper of logging. For further logging usage, please check here
self.logger.debug("debug")
self.logger.info("info")
Command Line Arguments
You can add your own command line argument using self.context.config.set_argument
within task.py
.
After you execute with command line arguments, you can access to the passed arguments through self.context.config.get
.
For example, set --model
to command line argument.
# Set your command line argument
def set_arguments(self) -> None:
self.context.config.set_argument('--model-path', dest="model_path", help='set model path')
# Execute main.py with argument "--model"
$ python main.py --model-path /path/to/model
# You can access to the value passed to "--model"
def execute(self, inputs: List[Any]) -> List[Any]:
model = self.context.config.get('model_path')
Shared directories
There are 4 shared directories, which is config
, input
, output
, tmp
.
They are shared among the environment and every containers can access them.
config
: Where config files are located.input
: Where input files are located.output
: Where output files are located.tmp
: Where temporary files are located. Podder Pipeline creates the directory under thetmp/dag_id/job_id
to keep each job's temporary files.
When you need to locate the temporary files, please put them into tmp
directory.
You can get the path to tmp
directory by self.context.file.get_tmp_path(file_name)
.
self.context.file.get_tmp_path('sample.csv')
# => /path/to/shared/tmp/sample.csv
How to run Podder Task
Run on Docker
We strongly recommend to run Podder Task using Docker.
- Build docker image
$ docker build -t podder-task .
- Execute on the docker container
$ docker run -it --env-file .env.example podder-task bash
# You can run your code
$ python main.py --inputs tests/files/inputs.json
- Run with one-liner
If you want to run it with one-liner code, you can also run it.
$ docker run -it --env-file .env.example podder-task python main.py --inputs tests/files/inputs.json
Run on local
For Mac os, Linux user
# clone podder-task
$ git clone git@github.com:podder-ai/podder-task.git
$ cd podder-task
# enable python3
$ python3 -m venv env
$ source env/bin/activate
# install required libraries
$ pip install -r requirements.txt
# run sample code
$ python main.py --inputs /path/to/input/a /path/to/input/b
For Windows user with PowerShell
If using Powershell, the activate script is subject to the execution policies on the system. By default on Windows 7, the system's excution policy is set to Restricted
, meaning no scripts as virtualenv activation script are allowed to be executed.
In order to use the script, you can relax your system's execution policy to Unrestricted
, meaning all scripts on the system can be executed. As an administrator run:
C:\>Set-ExecutionPolicy Unrestricted -Scope CurrentUser -Force -Verbose
# clone podder-task
C:\> git clone git@github.com:podder-ai/podder-task.git
C:\> cd podder-task
# enable python3
C:\>python3 -m venv C:\path\to\myenv
# Windows cmd.exe
C:\> C:\path\to\myenv\Scripts\activate.bat
# PowerShell PS
C:\> C:\path\to\myenv\Scripts\Activate.ps1
# install required libraries
C:\> pip install -r requirements.txt
# run sample code
C:\> python main.py --inputs /path/to/input/a /path/to/input/b
Configuration
Copy and create .env
file and add your env variables.
$ cp .env.sample .env
Linter, Formatter and Unit Test
Please execute linters, formatters and unit tests before committing your source codes.
How To Execute
You can execute them by the following command. Make sure that you are under the root directory of your project. (e.q. podder-task/)
$ pip install -r ./requirements/requirements.develop.txt
$ sh ./scripts/pre-commit.sh
Supported Libraries
Linter
- flake8
Formatter
- autopep8
- yapf
- autoflake
- isort
Unit Test
- pytest
Rules of Development
Please follow the official documents of the libraries.
How To Execute Unit Test
$ cd podder-task
$ docker build . -t podder-task
$ docker run --env-file .env.example -t podder-task pytest
Implementation note
Finally, your task implementation will be integrated to Podder-Pipeline and deploy using Docker/Kubernetes. To make it easier, please follow this implementation rules below.
- Only add your code to
app/task.py
- Put your data set or model files to
data
- Your task implementation will be compiled by Cython in integrating. Please don't use
__file__
in your code. - Create virtual environment for your code. Please check Creation of virtual environments
Please add issue & pull request if you have any request!