Home

Awesome

simple-amt

simple-amt is a microframework for working with Amazon's Mechanical Turk (AMT). It was designed with the following three principles in mind:

Quick start guide

Follow these steps to set up simple-amt and run a simple HIT on AMT.

Check out the codebase and set up a virtualenv

git clone https://github.com/jcjohnson/simple-amt.git
cd simple-amt
virtualenv .env
source .env/bin/activate
pip install -r requirements.txt

Configure your Amazon account

To use AMT, you'll need an Amazon AWS account. To interact with Amazon, simple-amt needs an access key and corresponding secret key for your Amazon account. You can find these here. Once you have these, place then in a file called config.json for simple-amt:

cp config.json.example config.json
# edit config.json; fill out the "aws_access_key" and "aws_secret_key" fields.

WARNING: Your AWS keys provide full access to your AWS account, so be careful about where you store your config.json file!

Launch some HITs

We've included a sample HIT that asks workers to write sentences to describe images. To launch a couple of these HITs on the AMT sandbox, run the following:

python launch_hits.py \
  --html_template=hit_templates/image_sentence.html \
  --hit_properties_file=hit_properties/image_sentence.json \
  --input_json_file=examples/image_sentence/example_input.txt \
  --hit_ids_file=examples/image_sentence/hit_ids.txt

This is the most complicated command that you will need to run; let's break it down:

Note: You may be seeing an error message scrolling repeatedly if you're setting up AMT for the first time, asking you to "Please log in to https://requestersandbox.mturk.com/ and complete registration." You have to register on that URL first and then run again.

Do your HITs

Your HITs should now be live on the Mechanical Turk sandbox. Open the sandbox and sort by "HIT creation data (newest first)". You should see a HIT with the title "Write sentences to describe images" in the first page or two of results. Complete one of the HITs.

The HIT can sometimes take several seconds to appear. You can double check that the HIT is up and available by going to the requester sandbox page and clicking manage -> manage hits individually.

Also note that you may not satisfy the qualifications of your own HIT. In this case you can edit the file hit_properties/image_sentence.json and erase the lines corresponding to HitsApproved and PercentApproved. Remember to bring down the HITs you've launched (see below), and then re-launch the HIT (see above).

Check HIT progress

You can check the status of your in-progress HITs by running the following command:

python show_hit_progress.py --hit_ids_file=examples/image_sentence/hit_ids.txt

Get HIT results

You can fetch the results of your completed HITs by running the following command:

python get_results.py \
  --hit_ids_file=examples/image_sentence/hit_ids.txt \
  > examples/image_sentence/results.txt

The results of all completed HITs are now stored as in the file examples/image_sentence/results.txt. Each line of the file contains a JSON blob with the results from a single assignment.

If you collect your results before all your hits have been completed and need to call get results again, you can optimize the function by passing in the results you have already collected using the following command:

python get_results.py \
  --hit_ids_file=examples/image_sentence/hit_ids.txt \
  --output_file=examples/image_sentence/results.txt \
  > examples/image_sentence/results.txt

Approve work

If you are satisfied with the results that you have gotten, you can approve all completed assignments from your HIT batch by running the following command:

python approve_hits.py --hit_ids_file=examples/image_sentence/hit_ids.txt

Or if you want to approve individual assignments, you can save all the assignments id in a file assignment_ids.txt and then call the following command:

python approve_assignments.py --assignment_ids_file=examples/image_sentence/assignment_ids.txt

Delete HITs

Once your HITs are completed and you have saved the results, you can delete the HITs from Amazon's database with the following command:

python delete_hits.py --hit_ids_file=examples/image_sentence/hit_ids.txt

WARNING: After running this command, your HITs will no longer be visible to workers, and you will no longer be able to retrieve HIT results from Amazon. Make sure that you have saved the HIT results before running this command.

Get All HITs

In the event that you want to get the results for all the hits that you have launched on mtc, regardless of what their hit ids are, you can call the following function. It will save a json array where every element is a hit result.

python get_all_hits.py \
  > examples/image_sentence/all_results.txt

Rejecting Work

If you are unhappy with the work done and want to reject the work, you can call the following command (please note that rejecting work harms worker's rating on the site and can influence their ability to find other work):

python reject_hits.py --hit_ids_file=examples/image_sentence/hit_ids.txt

Or you can also reject individual assignments:

python reject_assignments.py --assignment_ids_file=examples/image_sentence/assignment_ids.txt

You can also delete individual hit ids from the command line:

python delete_hit.py --hit_id THE_HIT_ID_YOU_WANT_TO_DISABLE

Blocking Workers

In extreme cases, when you want to prevent a malicious worker from affecting your work, you can use the following commands to block or unblock them using their worker ids. Save the worker ids that you want to block in a file (e.g. worker_ids.txt) and call the following to block workers:

python block_workers.py --worker_ids_file=examples/image_sentence/worker_ids.txt

or to unblock workers:

python unblock_workers.py --worker_ids_file=examples/image_sentence/worker_ids.txt

Running on the production site

To run your HITs on the production AMT site, simply append a --prod flag to each of the above commands.

WARNING: Running HITs on sandbox is free, but running HITs on the production site is not. In order to launch HITs your Mechanical Turk account must have sufficient funds to pay for all HITs; these funds will be held in escrow by Amazon once you launch HITs, and will be paid to workers when you approve assignments.

Creating your own HITs

To create your own HITs, you'll need to do the following:

  1. Create HTML template for HIT UI
  2. Create HIT properties file
  3. Prepare input file

We'll walk through each of these steps in more detail.

Build HIT UI

Building the UI is typically the most time-consuming step in creating a new type of HIT. You will have to do most of the work yourself, but simple-amt can still help. As a running example, we will use the UI defined in hit_templates/simple.html. This is a very basic HIT that asks workers to write an example of a category, like a type of dog or a flavor of ice cream.

If you look at hit_templates/simple.html, you'll notice that it looks like regular HTML except for the line

{% include "simpleamt.html" %}

This includes the file hit_templates/simpleamt.html, which does two things:

  1. Sets up DOM elements where HIT input and output will be stored; the only one of these that you need to know is the submit button, which has the ID #submit-btn.
  2. Sets up a global Javascript object called simpleamt that defines functions for working with Mechanical Turk on the frontend.

The Javascript simpleamt object provides the following functions:

To see a minimal example of these functions in action, look at the file hit_templates/simple.html.

While developing a HIT template, you will need to render the template to produce a valid HTML page that you can view in a browser. You can do this using the render_template.py script. Use it like this:

python render_template.py --html_template=hit_templates/simple.html --rendered_html=rendered_templates/simple.html

The rendered template will be stored in a directory called rendered_templates (you can change this by passing in the complete destination path of where you want the html rendered file to be saved.). Whenever you change your HIT template you will need to rerender to see your changes.

To actually view the rendered template in a web browser, you will need to run a local HTTP server so that protocol-relative URLs resolve properly. Python makes this very easy; just run

python -m http.server 8080

then point your web browser at http://localhost:8080/.

Create HIT properties file

To launch HITs, you need both an HTML template defining the UI for the HIT and a JSON file storing properties of the HIT. An example JSON file is hit_properties/simple.json. A HIT properties JSON file has the following fields (some are required and some are optional):