Home

Awesome

goq: a queuing and job management system fit for the cloud. Written in Go (golang).

News: v2.0.5 is our latest release, working on Windows. Looking for the old version? use the v1.0.0-branch tag.

Goq (Go-queue, or nothing to gawk at!) is a replacement for job management systems such as Sun GridEngine. (Yeah! No Reverse DNS hell on setup!)

Goq's source is small, compact, and easily extended and enhanced. The main file is goq.go. The main dispatch logic is in the Start() routine, which is less than 300 lines long. This compactness is a tribute to the design of Go, in particular to its uniquely expressive channel/select system.

Goq Features:

status

Excellent. Working and useful. OSX and Linux/amd64 builds are actively exercised.

compiling the source : 'go get' will fail the first time; you must run 'make' after 'go get'.

# add to your ~/.bashrc
export PATH=$GOPATH/bin:$PATH  # might already done.

Then save the ~/.bashrc changes, and source them with

$ source ~/.bashrc # have changes take effect in the current shell

Goq was built using BDD, so the test suite has good coverage. If 'go test -v' reports any failures, please file an issue.

deploy

a) server: On your master node, set the env variable GOQ_HOME to your home directory (must be a directory where Goq can store job output in a subdir). Then do:

$ cd $GOQ_HOME
$ goq init     # only needed once.
$ nohup goq serve &   # start the central server

b) job submission: 'goq sub mycommand myarg1 myarg2 ...' will submit a job. You can be in any directory; Goq will try to write output to ./o back (by default; controlled by the GOQ_ODIR setting) in that directory. Failing that (if the filesystem on the server is laid out differently from that of the sub host), it will write to $GOQ_HOME. For example:

$ cd somewhere/where/the/job/wants/to/start
# start by doing 'goq sub' on the same machine 
# that 'goq serve' was launched on. Just to learn the system.
$ goq sub ./myjobscript  

c) workers: Start workers on compute nodes by copying the .goq directory to them, setting GOQ_HOME in the env/your .bashrc. Then launch one worker per cpu with: 'nohup goq work forever &'. For example (assuming linux where /proc exists):

$ ssh computenode
$ for i in $(seq 1 $(cat /proc/cpuinfo |grep processor|wc -l)); do 
  /usr/bin/nohup goq work forever & done

The 'runGoqWorker' script in the Goq repo shows how to automate the ssh and start-workers sequence. Even easier: start them automatically on boot (e.g. in /etc/rc.local) of your favorite cloud image, and workers will be ready and waiting for you when you bring up that image. Do not run Goq as root. Your regular user login suffices, and is safer.

d) on your cloud nodes' firewalls, open tcp:1024-65535 so that goq nodes can communicate. All node's clocks should be synced to the same ntp time-source (or at least to within 10 seconds of each other). Goq will interpret old or repeated-and-identical messages as replay-attacks (or network wierdness) and will ignore them.

using the system: goq command reference

There are three fundamental commands to goq, corresponding to the three roles in the queuing system. For completeness, they are:

Additional useful commands

configuration reference

Configuration is controlled by these environment variables. Only the GOQ_HOME variable is mandatory. The rest have reasonable defaults. Once you have run 'goq init' the settings are kept in $GOQ_HOME/.goq/serverloc, and should be adjusted there.

sample local-only session

Usually you would start a bunch of remote workers. But goq works just fine with local workers, and this is an excellent way to get familiar with the system before deploying to your cluster.

jaten@i7:~$ export GOQ_HOME=/home/jaten
jaten@i7:~$ goq init
[pid 3659] goq init: key created in '/home/jaten/.goq'.
jaten@i7:~$ goq serve &
[1] 3671
**** [jobserver pid 3671] listening for jobs on 'tcp://10.0.0.6:1776', output to 'o'. GOQ_HOME is '/home/jaten'.
jaten@i7:~$ goq stat
[pid 3686] stats for job server 'tcp://10.0.0.6:1776':
runQlen=0
waitingJobs=0
waitingWorkers=0
jservPid=3671
finishedJobsCount=0
droppedBadSigCount=0
nextJobId=1
jaten@i7:~$ goq sub go/goq/bin/sleep20.sh # sleep for 20 seconds, long enough that we can inspect the stats
**** [jobserver pid 3671] got job 1 submission. Will run 'go/goq/bin/sleep20.sh'.
[pid 3704] submitted job 1 to server at 'tcp://10.0.0.6:1776'.
jaten@i7:~$ goq stat
[pid 3715] stats for job server 'tcp://10.0.0.6:1776':
runQlen=0
waitingJobs=1
waitingWorkers=0
jservPid=3671
finishedJobsCount=0
droppedBadSigCount=0
nextJobId=2
wait 000000   WaitingJob[jid 1] = 'go/goq/bin/sleep20.sh []'   submitted by 'tcp://10.0.0.6:46011'.   
jaten@i7:~$ goq work &  # typically on a remote cpu, local here for simplicity of demonstration. Try 'runGoqWorker hostname' for starting remote nodes.
[2] 3726
jaten@i7:~$ **** [jobserver pid 3671] dispatching job 1 to worker 'tcp://10.0.0.6:37894'.
[pid 3671] dispatched job 1 to worker 'tcp://10.0.0.6:37894'
---- [worker pid 3726; tcp://10.0.0.6:37894] starting job 1: 'go/goq/bin/sleep20.sh' in dir '/home/jaten'

jaten@i7:~$ goq stat
[pid 3744] stats for job server 'tcp://10.0.0.6:1776':
runQlen=1
waitingJobs=0
waitingWorkers=0
jservPid=3671
finishedJobsCount=0
droppedBadSigCount=0
nextJobId=2
runq 000000   RunningJob[jid 1] = 'go/goq/bin/sleep20.sh []'   on worker 'tcp://10.0.0.6:37894'.   
jaten@i7:~$ # wait for awhile
jaten@i7:~$ ---- [worker pid 3726; tcp://10.0.0.6:37894] done with job 1: 'go/goq/bin/sleep20.sh'
**** [jobserver pid 3671] worker finished job 1, removing from the RunQ
[pid 3671] jobserver wrote output for job 1 to file 'o/out.00001'

jaten@i7:~$ ---- [worker pid 3726; tcp://10.0.0.6:37894] worker could not fetch job: recv timed out after 1000 msec: resource temporarily unavailable.


[2]+  Done                    goq work
jaten@i7:~$ goq shutdown
[pid 3767] sent shutdown request to jobserver at 'tcp://10.0.0.6:1776'.
[jobserver pid 3671] jobserver exits in response to shutdown request.
jaten@i7:~$ 

author: Jason E. Aten, Ph.D. j.e.aten@gmail.com.