Home

Awesome

Skeleton Test Suite Generator

A utility for the automated generation of digital objects based on the digital signatures documented in the PRONOM database maintained by The National Archives, UK.

Introduction

The skeleton-test-suite-generator seeks to fill the gap that exists whereby the community requires a corpus of digital objects for the validation and evaluation of format identification tools and techniques.

The output of the skeleton suite should be used to complement a methodology whereby skeleton files are also generated manually by signature developers.

The research paper this work led to can be found here: IJDC.08.01.2013.

Container skeletons

The container skeleton suite requires some different technologies to run. It is hosted in a separate repository.

Builder

Richard Lehane's builder, builds skeleton suites with each new PRONOM release and includes both standard binary skeletons and the container suite. It is a must-have for all file format signature developers.

Technical

The tool takes a signature specified for a digital object in PRONOM and constructs a digital object that will match its footprint. For example, given the signature:

CAFED00D{4}CAFEBABE(0D|0D0A)

The hex sequences comprising digital objects that will match this signature in DROID will look like the following:

CA FE D0 0D 00 00 00 00 CA FE BA BE 0D

Or:

CA FE D0 0D 00 00 00 00 CA FE BA BE 0D 0A

The scripts take an export of the PRONOM database in XML, extract the internal signature information belonging to each format record and generate the digital objects - creating the 'skeleton test suite'.

The objects can be used for:

Other benefits include a small footprint - zipped the suite is just over 150kb in size. Unzipped the suite is approx 390kb.

Does not suffer issues relating to IPR and copyright. The suite and generator tool, licensed under CC BY-SA (see below).

The tool so far is a prototype and it doesn't handle every sequence in PRONOM as of yet. Signatures with multiple BOF sequences, for example, will not generate correctly. While this can be corrected by the team working on PRONOM, these are legitimate sequences that should be handled by the tool.

HOWTO

python skeletongenerator.py

Easy as. The scripts require the existence of the 'pronom-export' folder generated by the scripts in the pronom-xml-export repository: https://github.com/exponential-decay/pronom-xml-export

The input and output locations can be configured by modifying the accompanying cfg file skeletonsuite.cfg.

Files are generated by default by using NULL bytes to 'fill' the file as dictated by a signature. This can be configured in the cfg file using the character value for the requested fill values or <0 or >255 for random bytes.

Version information can be displayed by running:

python skeletongenerator.py --version

Testing reports

I completed two reports on the Skeleton Test Suite back in 2012/2013. They document testing of the files on DROID and explore reasons why some files do or do not work. The reports and links to the test-suites used for testing can be found on the repo wiki.

TODO

For the community TODO

License

Copyright (c) 2012 Ross Spencer

This software is provided 'as-is', without any express or implied warranty. In no event will the authors be held liable for any damages arising from the use of this software.

Permission is granted to anyone to use this software for any purpose, including commercial applications, and to alter it and redistribute it freely, subject to the following restrictions:

  1. The origin of this software must not be misrepresented; you must not claim that you wrote the original software. If you use this software in a product, an acknowledgment in the product documentation would be appreciated but is not required.

  2. Altered source versions must be plainly marked as such, and must not be misrepresented as being the original software.

  3. This notice may not be removed or altered from any source distribution.

PRONOM data

PRONOM data, not owned by this repository is licensed under the Open Government Licence (OGL).

Open Government License