When exporting, importing or publishing data that links to each other, it might be neccessary to rewrite the links between the files. E.g., consider you have two local files called
A.xml contains a reference to
A.xml has been imported as
textgrid:4721.0 this link should read
since the original filename is no longer known and now TextGrid URIs are the means of reference. Similarly, after publication those URIs should be rewritten to PIDs.
Where URIs are rewritten depends on the content type of the respective file. E.g., in TEI files, we should rewrite (among others)
//ref/@target, while we should rewrite, e.g.,
//a/@href in XHTML.
Choosing a rewrite method
By default, the Import and Export tool will select an appropriate rewrite method for your document's detected content type. You can modify this for individual items by clicking the corresponding table cell in the import or export tool, you'll see a combo box in which you can chose from the built-in rewriting specifications.
You can also specify the URI to a rewriting spec by typing it into the cell, e.g.,
internal:tei#tei for the built-in TEI transformation, or, say,
textgrid:9876#myformat for the spec with the ID
myformat in the object at
Rolling your own rewrite method
To specify your own import method, you need to write an XML file that conforms to the import specification schema. We'll use the specification for TEI documents as an example since it demonstrates all available features:
This first defines the importSpec and declares the required namespaces. We then start a xmlConfiguration (i.e. the spec for a single format). This requires an id (here
tei), and you should also provide a description that can be shown in the user interface.
Now we describe the elements and attributes that should be rewritten:
tei:name is associated with the
method='none', which means its contents shouldn't be rewritten. However, it has an attribute named
target that can contain URIs which we should rewrite. The
token method means that the attribute can contain a whitespace-separated list of URIs which should be rewritten separately. The alternatives here are
none (no rewriting),
token (white-space separated list of values) or
full (whole attribute value is one value).
idno element, we only want rewriting when we import (or publication) – on export, existing values should be kept as-is. Additionally, we only want rewriting when the
idno element has a
type attribute that matches the regular expression
textgrid|handle, i.e. we only want to rewrite TextGrid URIs and Handles.
Sometimes you'll want to handle any element, without the need to list them explicitely. You can do so as illustrated here:
I.e., we'd like to support the attributes
facs on just any attribute.
Here's the rest of the TEI spec:
- There's no support for
- We don't support patterns or XPath expressions for element or attribute values.