Chapter 12. The Word-to-DITA Transformation Framework

The Word-to-DITA (Word2DITA) transformation framework enables the reliable generation of maps and topics form styled Word documents.

The Word-to-DITA transformation framework provides a general facility for converting styled Microsoft Word documents into DITA maps and topics. It is intended primarily to support ongoing authoring of DITA content using Microsoft Word, where all authoring is done in Word and DITA maps and topics are generated from the Word on demand. It does not provide a way to go from DITA back to Word (although that would be technically possible, if not trivial).

The Word-to-DITA transformation is not intended to support general data conversion from arbitrary Word documents to DITA. It requires at least some amount of consistent styling. However, it may still produce useful starting points from lightly-styled documents. Try it and see. Because the output of the transform can be quite complete, it may be most effective to do data cleanup in Word (that is, applying appropriate styles) and then use the transform to generate more-or-less ready-to-use maps and topics, rather than generating DITA content that needs significant rework to be usable.

The Word-to-DITA process can do any of the following:
  • Generate a single DITA topic document from a single Word document
  • Generate a single DITA map document and one or more topic documents from a single Word document
  • Generate a tree of maps and one or more topic documents from a single Word document.

Transformations are from Word documents using a specific set of named styles to DITA documents of any type. The transformation is defined entirely or mostly through a declarative style-to-tag map that defines how each Word paragraph and character style maps to specific DITA structures.

The declarative style-to-tag map makes it quick and relatively easy to set up and maintain conversions. As long as the style-to-tag mapping is sufficient no XSLT programming is required.

If the style-to-tag map is not sufficient then you can extend the base Word-to-DITA transform using XSLT. Some reasons to extend include:
  • Handling complex structures that cannot be expressed with declarative mapping (usually deeply-nested structures)
  • Implementing custom rules for assigning element IDs.
  • Implementing custom rules for constructing map and topic filenames.

The Word-to-DITA transform can be used either through the DITA Open Toolkit or as a standalone XSLT transformation (for example, to embed it in a CMS-managed tool chain). While it is packaged as an Open Toolkit plugin, it has no dependencies on any Toolkit components—it just uses the Toolkit's processing framework to make it convenient to apply the transform.