Home

Awesome

PRONOM Archive and Skeleton Test Suite

Epilogue

The PRONOM archive and (archive) of the skeleton test suite is no longer maintained. The experiment was to create a record of the of Raw data that went into PRONOM (the file format registry) for future research purposes.

Rather than just storing the data and leaving it, handles were minted in Zenodo to provide an anchor for researchers to point back to. This made the workflow a bit more difficult to do by hand and it has been hard to keep up with the pace of publication with PRONOM.

Also, the reality is, I don't think digital preservation researchers actually care about this data and I don't anticipate PRONOM creating a historical interface allowing it to be viewed, filling in a gap in a historical audit trail for the system.

Most research projects looking at the change in file format signatures over the years have focused on the DROID signature file. This makes sense because the DROID signature file actually triggers file format identification and so it can be measured against something, i.e. something being identified vs. not being identified.

Anecdotally, I believe the DROID export has historically encoded PRONOM data with slight variations to the recorded signature. Richard Lehane can shed more light on these minor divergances through his work on Siegfried.

Generating a PRONOM archive in the future

Fortunately, the weight of maintaining a PRONOM record was not entirely on this repository's shoulders. Richard has also been maintaining somewhat of a PRONOM record through his builder utility.

Builder's record goes back to PRONOM v92 and the raw PRONOM data can be accessed via the tool's release pages: here.

To that end, between this repository going back to v70 and Builder starting at v92, should any researcher want to come back to the historical record of PRONOM, they can build a pretty decent picture going back to 2013.

Skeleton test suite generation

Generating a skeleton test suite will still be possible and that code will continue to be maintained as its benefit to signature and tool development. You can read more about the skeleton suite generator work below.

Introduction

Herein lies a tool for the automated generation of digital objects based on the digital signatures documented in the PRONOM database maintained by The National Archives, UK: PRONOM Data is licensed under the Open Government Licence (OGL).

The skeleton-test-suite-generator serves to fill the gap that exists whereby the community requires a corpus of digital objects for the validation and evaluation of format identification tools and techniques. The tool should be used to complement a methodology whereby skeleton files are also generated manually by signature developers.

The research paper this work led to can be found on the IJDC website.

DOIs

Each PRONOM release now has a DOI provided by the Skeleton Suite. This will help academics referencing versions of PRONOM but more importantly will help preservation of this record.

Source Code

Code to generate the container, and standard skeleton suites can be found:

Testing reports

I completed two reports on the Skeleton Test Suite back in 2012/2013. They document testing of the files on DROID and explore reasons why some files do or do not work. The reports and links to the test-suites used for testing can be found on the repository wiki, here: Skeleton-reports.

Blogs

More information can be found on my blog: More information.

Other blogs and uses of the skeleton suite

License

Copyright (c) 2015 Ross Spencer

This software is provided 'as-is', without any express or implied warranty. In no event will the authors be held liable for any damages arising from the use of this software.

Permission is granted to anyone to use this software for any purpose, including commercial applications, and to alter it and redistribute it freely, subject to the following restrictions:

The origin of this software must not be misrepresented; you must not claim that you wrote the original software. If you use this software in a product, an acknowledgment in the product documentation would be appreciated but is not required.

Altered source versions must be plainly marked as such, and must not be misrepresented as being the original software.

This notice may not be removed or altered from any source distribution.