Sample Custom Word-to-DITA XSLT Stylesheet

A working example of a custom Word-to-DITA XSLT stylesheet

To customize and extend the base Word-to-DITA transformation you need a new top-level XSLT document that includes docx2dita.xsl transformation and then adds any new templates you need in order to customize processing. The following example shows an example of extending the "final-fixup" mode to capture literal numbers in topic titles.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  exclude-result-prefixes="xs "
  version="2.0">
  
  <xsl:import href="../../org.dita4publishers.word2dita/xsl/docx2dita.xsl"/>
  
  <xsl:template match="concept/title" mode="final-fixup">
    <!-- Look for numbers for the form "1-1" at the start of titles and wrap in d4pSimpleEnumeration
         element. The number must be in the first text node, not in a subelement.
    -->
    <xsl:variable name="childNodes" select="./node()" as="node()*"/>
    <xsl:message> + [DEBUG] childNodes=<xsl:sequence select="$childNodes"/></xsl:message>
    <xsl:choose>
      <xsl:when test="$childNodes[1]/self::text() and matches($childNodes[1], '^[0-9]+')">
        <xsl:copy>
          <xsl:apply-templates select="@*" mode="#current"/>
          <xsl:analyze-string select="$childNodes[1]" regex="^(([0-9]+(-[0-9]+)*)[ ]+)(.*)">
            <xsl:matching-substring>
              <d4pSimpleEnumeration><xsl:sequence select="regex-group(1)"/></d4pSimpleEnumeration>
              <xsl:sequence select="regex-group(4)"/>
            </xsl:matching-substring>
            <xsl:non-matching-substring>
              <xsl:sequence select="."/>
            </xsl:non-matching-substring>
          </xsl:analyze-string>
          <xsl:apply-templates select="$childNodes[position() > 1]" mode="#current"/>
        </xsl:copy>        
      </xsl:when>
      <xsl:otherwise>
        <xsl:apply-imports/>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>
  
</xsl:stylesheet>

This transform as shown is intended to be packaged in a separate Open Toolkit plugin so that it is in a known location relative to the base transform provided by the Word-to-DITA. In this example the transform is in the directory xsl/ under the plugin's main directory (mirroring the organization of the word-to-DITA plugin itself).

The plugin descriptor looks like this:

<!-- 
  Plugin descriptor for an example Word-to-DITA extension transform
  
  Use this as a sample for your own plugin.
  
  -->
<plugin id="org.example.d4p.word2ditaextension">
  <require plugin="org.dita4publishers.word2dita"/> 

  <!-- This plugin just provides the transform in a reliable location
       relative to the base transform. It doesn't define its own
       transformation type.
    -->
</plugin>

The directory structure of the plugin is:

org.example.d4p.word2ditaextension/
  plugin.xml
  xsl/
    sample-word-to-dita-customization.xsl

You would copy the org.example.d4p.word2ditaextension directory to the plugins directory of your Toolkit in order to make it available. Note that because this plugin doesn't define a new transformation type or directly-extend any other plugins, you don't have to run the integrator.xml script after deploying it.

To use the customization you would specify the XSLT file as the value of the w2d.word2dita.xslt Ant parameter, either on the command line or in an Ant build script that applies the Word-to-DITA process to a specific file.

A working version of this plugin is included in the DITA for Publishers Open Toolkit plugin package as the plugin "org.example.d4p.word2ditaextension".