siegfried¶
Another signature-based file format identification tool is siegfried. The current installation instructions are for 64-bit systems running Ubuntu/Debian OS. A full list of options and installation instructions for multiple platforms can be found on the official page.
wget -qO - https://bintray.com/user/downloadSubjectPublicKey?username=bintray | sudo apt-key add -
echo "deb http://dl.bintray.com/siegfried/debian wheezy main" | sudo tee -a /etc/apt/sources.list
sudo apt update && sudo apt install siegfried
sf <name of the file>
sf 6285633006_R03C01_Red.idat
---
siegfried : 1.8.0
scandate : 2020-08-06T20:14:22Z
signature : default.sig
created : 2020-01-21T23:30:42+01:00
identifiers :
- name : 'pronom'
details : 'DROID_SignatureFile_V96.xml; container-signature-20200121.xml'
---
filename : '6285633006_R03C01_Red.idat'
filesize : 8095283
modified : 2020-08-05T20:51:17Z
errors :
matches :
- ns : 'pronom'
id : 'UNKNOWN'
format :
version :
mime :
basis :
warning : 'no match'
The default results are in the National Archives UK's PRONOM file format signature which is displayed in YAML format. There are built in flags such as -csv
or -json
to change the output format.
Modification and customization of the underlying signature database is done using roy
tool. It is installed with homebrew and Ubuntu packages. More information in documentation for roy
. To build a MIME-info signature file, we can use the included signature files from Apache Tika (tika-mimetypes.xml) and freedesktop.org (freedesktop.org.xml) and use the -mi
flag. The roy build
creates a new signature file while roy add
adds a new identifier to an existing signature file. The changes will be reflected to the included default.sig
file.
MIME-info signature file
Apache Tika and freedesktop.org are examples of software projects that contain toolkits for content detection and metadata extraction for many commonly used file types all stored in their respective MIME-info signature files. roy
allows the ability to use any valid MIME-info file as a source when building signatures.
# Build a MIME-info database using tika identifiers
sudo roy build -mi tika-mimetypes.xml
# Add freedesktop.org MIME signature list
sudo roy add -mi freedesktop.org.xml
# Get MIME type
sf 6285633006_R03C01_Red.idat
---
siegfried : 1.8.0
scandate : 2020-08-06T18:24:38Z
signature : default.sig
created : 2020-08-06T16:48:25Z
identifiers :
- name : 'tika'
details : 'tika-mimetypes.xml (1.23, 2019-12-02)'
- name : 'freedesktop.org'
details : 'freedesktop.org.xml (1.15, 2019-10-30)'
---
filename : '6285633006_R03C01_Red.idat'
filesize : 8095283
modified : 2020-08-04T04:23:05Z
errors :
matches :
- ns : 'tika'
id : 'UNKNOWN'
format :
mime : 'UNKNOWN'
basis :
warning : 'no match'
- ns : 'freedesktop.org'
id : 'UNKNOWN'
format :
mime : 'UNKNOWN'
basis :
warning : 'no match'
The output has updated identifiers
to indicate the underlying database and the mime
fields to reflect MIME type database. Instead of overwriting the default.sig
file, it is best practice to create a different signature file with different identifier using the -name
flag. We can also create a single signature file with multiple identifiers.
Aliases
You can use aliases tika and freedektop in the above commands instead of the full .xml
file name. For example:
sudo roy build -mi tika
It is recommended to create a default MIME-info database prior to adding any custom MIME type information to the signature files.
# Builds a default-mime.sig file using tika identifier
sudo roy build -mi tika -name tika default-mime.sig
# Adds the freedesktop signature file to default-mime.sig
sudo roy add -mi freedesktop -name freedesktop default-mime.sig
# Check the MIME type using default-mime.sig
sf -sig default-mime.sig 6285633006_R03C01_Red.idat
default.sig
Building database without specifying the name of the database replaces the default.sig
file which contains the file format signature in PRONOM format. To restore the default.sig
file, rebuild the database:
roy build
To add custom MIME types, we can update either of the two MIME-info files. The xml
format remains consistent with previous examples for mimetype
and xdg-mime
and can be added to either tika-mimeinfo.xml
or freedesktop.org.xml
.
# Open the xml file
sudoedit /usr/share/siegfried/freedesktop.org.xml
Add the entry below for .idat
to the end of the file before the </mime-info>
divider. If you are working in a nano
text editor, you can hit Ctrl+_ and enter line number 7321 to add the .idat
entry. Preserve the indentation.
<mime-type type="application/vnd.binary">
<comment>Illumina proprietary IDAT format</comment>
<glob pattern="*.idat"/>
</mime-type>
We can now build a MIME-info database with the updated files. In our example, the .idat
entry was added to the freedesktop.org.xml
file. Since it would be useful to keep original identifiers, a custom signature file is built.
# Builds a custom.sig file using tika identifier
sudo roy build -mi tika -name tika custom.sig
# Adds the modified freedesktop signature file to custom.sig
sudo roy add -mi freedesktop -name freedesktop custom.sig
# Check the file entry using custom.sig
sf -sig custom.sig 6285633006_R03C01_Red.idat
---
siegfried : 1.8.0
scandate : 2020-08-06T20:57:10Z
signature : custom.sig
created : 2020-08-06T20:49:20Z
identifiers :
- name : 'tika'
details : 'tika-mimetypes.xml (1.23, 2019-12-02)'
- name : 'freedesktop'
details : 'freedesktop.org.xml (1.15, 2019-10-30)'
---
filename : '6285633006_R03C01_Red.idat'
filesize : 8095283
modified : 2020-08-04T04:23:05Z
errors :
matches :
- ns : 'tika'
id : 'UNKNOWN'
format :
mime : 'UNKNOWN'
basis :
warning : 'no match'
- ns : 'freedesktop'
id : 'application/vnd.binary'
format : 'Illumina proprietary IDAT format'
mime : 'application/vnd.binary'
basis : 'extension match idat'
warning : 'match on filename only'
Revert to default
To revert to default MIME type choose the default MIME database.
# Check the MIME type using default-mime.sig
sf -sig default-mime.sig 6285633006_R03C01_Red.idat