Extending and Overriding the Word to DITA Transform

How to customize the XSLT processing to handle special cases

Any transformation driven by a declarative mapping will be limited in what can be done purely through the mapping. Thus the Word-to-DITA process is designed to be extended and customized to handle special cases.

The Word-to-DITA transformation is a three-phase transformation:
  • Phase one processes the input DOCX document.xml file (and other files as needed) to generate an intermediate "simple word processing" document. The "simple" document reflects the original paragraph and character run data from the Word document annotated with the details from the style-to-tag mapping. Starting with version 1.0.0R28, the word-to-simpleWP processing can be extended to do auto-styling of runs in the mode "local:getRunStyleId" (defined in the module office-open-utils.xsl). Out of the box this mode automatically styles runs with manual formatting. You can do your own auto styling by adding templates to the "local:getRunStyleId" mode to do whatever you need to do.
  • Phase two processes the intermediate "simple" document to generate the initial result DITA XML (simple2dita.xsl)
  • Phase three is the "final fixup" phase, which processes the initial result DITA XML to generate the final DITA XML (final-fixup.xsl). By default the final-fixup is just a passthrough stage (except for <ph> elements that have @outputclass values like "b-i" and "i-u", which are turned into the equivalent nested inline elements). You can extend the final fixup phase as needed to handle cases that could not be handled by the style-to-tag map, such as adding additional required wrapper elements or moving things around.

In addition, the transformation provides default rules for generating output filenames and for generating element IDs, both of which can be overridden.

You extend and override the base transformation by creating a top-level XSLT document that imports the docx2dita.xsl transformation from the word2dita Toolkit plugin and implements any override or extension templates required. You can deploy this override in a separate Open Toolkit plugin that depends on the base word2dita plugin and that defines its own transformation type or you can specify the w2d.word2dita.xslt Ant parameter with the full path and filename of your custom transformation.

The extension points provided are:
  • The "final fixup" phase, which you can extend by implementing templates in the "final-fixup" mode. These templates operate on the DITA XML produced by the the second phase and must produce valid DITA markup as their output. Note that the XML handled by the final-fixup mode doesn't have any associated schema or DTD at that point, so there are no @@class attributes to key on. This means you must use templates that match on the element type names, rather than on @@class attribute values.
  • The "generate-id" and "topic-name" modes can be overridden to provide custom ID generation logic. The mappings for individual paragraph and character styles can specify a named "ID generator" value that is then passed as a parameter ot the generate-id mode. Custom ID generator code can use that parameter to select the appropriate ID generation logic.
  • The "topic-url"mode can be overridden to implement custom rules for constructing result topic filenames (base implementation is in the file modeTopicUrl.xsl).
  • The "map-url" mode can be overridden to implement custom rules for construction result map filenames (base implementation is in the file modeMapUrl.xsl).
  • The "local:getRunStyleId" mode, which lets you adjust how styles for runs are determined (office-open-utils.xsl).