Common Style-to-Tag Mapping Cases

Describes how to map styles to tags for common cases

This section describes how to set up the mapping for different common cases.

But first some general tips:

For paragraphs that map to topic components you must specify the @topicZone attribute as this attribute is used to group incoming paragraphs for further processing. If the @topicZone attribute is not specified a default of "body" is used. This is usually what you want but would be wrong if the paragraph should have gone in the map or topic prolog.
Paragraphs that contribute to the root topic or map metadata can go anywhere in the Word document because they go into a separate topic zone and therefore get plucked out of the input and put in the right place regardless of where they occur.
The @level attribute is what determines the hierarchical relationship among elements within the same output context, such as within the topic body.
You can specify either the style display name, using @styleName or the style ID, using @styleId. The style name is the display name that is used in the various Word style-related user interfaces. The style ID is the value specified in the underlying markup for paragraphs and character ranges. If you specify both style name and style ID on a mapping, the style name takes precedence. In general, the style name is more reliable as Word may change the style ID, for example, when using locale-specific versions of Word.
The "style ID" for Word styles is not the same as the display name of the style, but it is usually very similar. In most cases the style ID is the display name with all spaces and special character removed, e.g., the style with the display name "Heading 1" has the style ID "Heading1". If you happen to define two style names that differ only in their use of spaces, e.g., "Heading 1" and "Heading1", then the style ID of the second style defined will be something like "Heading10" so that the style ID is unique within the Word document. If you're not sure what the style ID is you can open the document.xml file from the DOCX Zip package and look for <w:pStyle> elements—the value of the @w:val attribute is the style ID.
The @tagName attribute always specifies the name of the primary (most deeply nested) result element. The primary result may be surrounded by other elements as specified with other attributes, such as @containerType. The element named by @tagName is the element that will contain the text of the paragraph or character run being mapped, except in a few special cases, such as mapping to section elements.
The @outputclass attribute is helpful for applying further XSLT processing (particularly in finding appropriate elements) and for applying additional styling in an output (like in HTML where the @outputclass attribute becomes part of the generated class attribute that can be used with CSS).
A paragraph that generates a topic may also generate a map or submap and in that case will also generate a topicref to the topic (and a mapref to the submap if a submap is generated). This lets you generate trees of maps in addition to trees of topics. You can choose to generate a hierarchy of topics as separate topic documents with nested topicrefs or as a single XML document with a root topic and nested topics.

Finally, the design of the current style-to-tag mapping document evolved somewhat organically based on client requirements as they came up. There are aspects of it that are not necessarily logically consistent. I have started the process of designing a new style-to-tag mapping document type but it's on the back burner. But I would definitely welcome any suggestions for how to make the mapping markup clearer or easier to use.

Simple Paragraphs and Character Runs

For paragraphs that simply map directly to elements within the topic body with no required container elements the general mapping specification is:

<style
  styleName="Normal"
  tagName="p"
  topicZone="body"
  level="1"
/>

Where the parts in italics are what you would change to match a specific Word paragraph style to a specific DITA output element.

This example maps a paragraph with the name "Normal" to the DITA element <p>.

Likewise, for character runs, you specify the style name and tag name:

<style
  styleName="Heading 1 Char"
  tagName="ph"
  topicZone="body"
  level="1"
/>

Note that as of version 0.9.16 there is no support for nesting character styles in the output, so if you need something like "bold italics" which you would typically markup like <b><i> in DITA you will need to define a single phrase-level element like <bi>. Note that the DITA for Publishers formatting domain includes elements for common combinations of bold, italic, and smallcaps, as well as other elements you might need. You can also use the transform's "final fixup" mode to convert single elements like <bi> into nested elements if you want.

Heading Paragraphs That Map to Topics

To map heading paragraphs to topics you specify the topic type and the markup details for the generated topic. The heading paragraph becomes the topic title and the other topic elements are generated automatically.

For example, to map the paragraph "Heading 1" to a concept topic you would use this mapping:

<style styleName="Heading 1"
  structureType="topicTitle"
  tagName="title"
  topicType="concept"
  bodyType="conbody"
  level="1"
  topicDoc="yes"
  format="concept"
  topicrefType="chapter"
/>

The structure type "topicTitle" indicates that this paragraph acts as a topic title, which means it will generate a new topic, either in a separate document or as a subtopic of a parent topic. In this case the value of the @level attribute ("1") indicates that this is a top-level topic.

The @tagName attribute defines the tagname to use for the topic's title element, "title" in this case (it would only be different from "title" if you were generating a specialized topic with a specialized topic title element).

The @topicType attribute specifies the tagname for the root element of the topic to be generated, "concept" in this case.

The @bodyType attribute specifies the tagname for the topic body element, "conbody" in this case.

The @level attribute indicates that this is a top-level topic and it would be subordinate only to a paragraph that specifies level zero, which is normally reserved for the paragraph that generates the root map or topic and should normally be the first unskipped paragraph in the Word document.

The @topicDoc attribute indicates that this topic should be put in a new document, which in turn implies the generation of a topicref to the generated document. The default is "no" so you must specify @topicDoc with a value of "yes" if you want the topic chunked out to a new file.

If you specify "yes" for @topicDoc then you must specify @format, which names an <output> element defined in the style-to-tag mapping document. The format value determines the public and system IDs to use for the generated topic document. You must also specify the @topicrefType attribute if process is generating a map in addition to topics.

If you specify "yes" for @topicDoc and the topic is not the root topic (meaning the primary result topic) then the style mapping should also generate a map so that there is a place to put a topicref to the generated topic document.

The @topicrefType attribute specifies the tagname to use for the topicref that will refer to the generated topic. In this case the topicref tagname is "chapter".

Mapping lower-level headings uses the same pattern, specifying the appropriate level, e.g., level "2" for paragraph style Heading 2 and so on.

Mapping List Paragraphs (Generating Container Elements)

Lists are typical of elements that must be generated within the context of container elements, e.g. <li> within <ol> or <ul>.

To map list item paragraphs to DITA lists you map the paragraph to the appropriate list item element, e.g., <li> and then specify the name of the container element to wrap all adjacent list items in, specified using the @containerType attribute, like so:

<style
  styleName="List Bullet"
  tagName="li"
  containerType="ul"
  level="1"
  topicZone="body"
/>

Here the tag name is "li", indicating that the paragraph content will be output in a <li> element. The @containerType attribute names the tagname of the container element for the list item, <ul> in this case. The value "1" for the @level attribute means that this is the first level of thing within the topic zone (body). Any elements to be nested inside the list item would need to specify level "2" (for example, paragraphs for 2nd-level list items).

The implication of @containerType is that all adjacent Word paragraphs with the same container type will be output into a single instance of the specified container. This is how you can get a sequence of ListBullet paragraphs to output as a single DITA <ul> element.

This pattern of @tagName and @containerType can be used for any paragraphs that need to be output inside a common container that are not creating DITA <section> elements (which have special mapping support).

Mapping Definition Lists

Definition lists are a challenge because there is no single Word structure that maps directly to definition lists and DITA requires two-levels of containment: one for the definition list as a whole and one for each term/definition pair. To model definition lists you must use pairs of paragraphs: one for the definition term and one for the definition description.

Definition term paragraphs use the @structureType value "dt" and definition description paragraphs use the @structureType value "dd". For a pair of paragraph styles that represent a term/definition pair, they both specify the same value for the @dlEntryType attribute, e,g., "dlentry". Finally, they both specify the same value for the @containerType attribute, which specifies the overall definition list element, e.g. "dl".

Thus, given two paragraph styles "Def Term" and "Def Desc" you would define these mappings to generate a normal DITA definition list:

<style
  styleName="Def Term"
  structureType="dt"
  tagName="dt"
  dlEntryType="dlentry"
  containerType="dl"
  topicZone="body"
/>
<style
  styleName="Def Desc"
  structureType="dd"
  tagName="dd"
  dlEntryType="dlentry"
  containerType="dl"
  topicZone="body"
/>

Note that the resulting markup doesn't have to be a specialization of <dl> it just has to have the same structural pattern of two levels of containment with the lowest-level elements in common containers. You could use this mapping pattern to generate simple tables, for example.

Mapping Procedure Steps

Steps in procedures are similar to definition lists in that there are two layers of markup that wrap the paragraphs for a step: the <steps> element that contains each <step> and the <cmd> element that is the required first subelement of <step>.

Given a paragraph that is the first or only paragraph of a step, you can map it by treating it like a definition list, using a @structureType of "dt":

<style styleName="List 1"
  structureType="dt"
  dlEntryType="step"
  containerType="steps"
  tagName="cmd"
  level="1"
/>

Here the paragraph style "List1" is mapped to <cmd> within <step> within <steps>.

If you have follow-on paragraphs for the step they should go in an <info> element. For this case you would use normal mapping and specify level 2:

<style
  styleName="BodyText Step"
  containerType="info"
  tagName="p"
  level="2"
  structureType="block"
  topicZone="body"
/>

If you have list items that need to go in the <info> element then you use a definition list mapping like so:

<style
  styleName="Bullet 2 Step"
  containerType="info"
  dlEntryType="ul"
  tagName="li"
  level="2"
  structureType="dt"
  topicZone="body"
/>

Because the container type for both BodyText Step and Bullet 2 Step is "info" in this example both paragraph types will end up contributing to the same info element in the output if they occur together.

Mapping Tables

Word tables are automatically converted to DITA tables. Paragraph and character styles within table cells will be mapped as defined in the mapping but you don't have direct control over how the table elements map to DITA markup (but you can always post-process the initial DITA into whatever you want).

The transform captures as much detail from the Word table as can be expressed using DITA tables, including the precise table geometry, row and column spans, row and cell borders, and horizontal and vertical cell alignment.

Skipping Paragraphs

It's often useful or necessary to have paragraphs in the Word document that shouldn't be reflected in the DITA output. For this case you can use a @structureType of "skip". Paragraphs with a structure type of "skip" are ignored and have no effect on output. In particular, they do not affect the determination of what paragraphs are adjacent for @containerType processing.

In addition to explicitly-skipped paragraphs, paragraphs that are not otherwise mapped and that have empty content (that is, they normalize to a single blank or to the empty string) are automatically skipped. This saves you having worry about users creating empty paragraphs to get vertical spacing in Word documents.

Mapping to Sections

DITA sections are challenging because they have three ways to represent the title: a literal title child element, an explicit @spectitle attribute, or a @spectitle set in the document type and not intended to be set by authors. In addition, some topic types require the use of sections within the topic body. This all leads to a bit of complexity.

The simplest case is where a paragraph acts as the title of section and should result in a section where the paragraph is section title and the subsequent paragraphs are within the section. For this you specify a @structureType of section and a @tagName of "title" (or whatever the section title element should be, but usually "title"):

<style 
  styleName="My Section Title"
  structureType="section" 
  tagName="title"
  sectionType="section"
  topicZone="body"
  useAsTitle="yes"  
/>

Note that you don't have to worry about the @level attribute with sections because DITA sections cannot nest and must be direct children of the topic body. So the conversion processing will always do the right thing for sections.

Note, however, that if you have any sections within your topic body then all the paragraphs that follow the first section-creating paragraph will be in a section because there's no way to indicate that a given paragraph should not be in a section. However, this shouldn't be a problem in practice because it would be a very rare markup design that expected there to be a random mix of section elements and non-section elements within the topic body.

Any paragraphs that precede the first section-creating paragraph within a topic body will be direct children of the topic body unless you specify the @initialSectionType attribute for the paragraph that generates the containing topic.

For example, the Learning and Training learningContent topic type requires the use of section-type elements within the topic body. To handle this case you can specify the @initialSectionType attribute to indicate that the initial paragraphs of the topic body should be wrapped in a section.

For example, given a paragraph style "Lesson Title" that maps to a learningContent topic, you would define a mapping like this:

<style 
  styleName="Lesson Title"
  structureType="topicTitle"
  initialSectionType="lcIntro"
  topicType="learningContent"
  bodyType="learningContentbody"
  level="1"
/>

This will result in markup like this:

<learningContent id="topicid">
  <title>Learning Content</title>
  <shortdesc>Put a shortdesc of one or two sentences here.</shortdesc>
  <learningContentbody>
    <lcIntro>
       ... (paragraphs go here)      
    </lcIntro>
  </learningContentbody>
</learningContent>

All paragraphs up to the first paragraph that maps to section would go within the <lcIntro> element in this example.

In some cases paragraphs should map to a section, provide the title or spectitle, and then be the first paragraph of the section. For example, a paragraph like this:

Usage: Controls the construction of the section title.

Could provide the spec title of "Usage" and the initial paragraph of "Controls the construction of the section title.".

You can do this by indicating that the paragraph is not the section title and that the spectitle is the text up to the first colon:

<style styleName="My Section"
  structureType="section"
  tagName="p"
  topicZone="body"
  spectitle="#toColon"
  
/>

Here the keyword value "#toColon" for the @spectitle attribute indicates that the spectitle value should be taken from the paragraph content up to, but not including, the first colon. (Other values could be implemented but as of version 0.9.6 the only implemented value is "#toColon".) The value "p" for the @tagName attribute indicates that the paragraph will be the first paragraph of the section rather than the title.

You can also specify a literal value for @spectitle, which simply becomes the value of the @spectitle attribute in the generated DITA XML.

Mapping to Maps and Map Components (Topicrefs)

The simplest mapping is one in which a single Word document maps to a single result topic document. However, you can generate systems of maps in addition to topics from a single input Word file. However, this gets a little complex because there's a lot going on.

When you are generating a map and topics you must map the first non-skipped paragraph in the Word document to a map element, which will generate the root map of the output structure. It can also map to a topic (and by implication, a topicref to that topic) but it need not.

For example, if your first paragraph has the style "Publication Title" and you want to generate a bookmap map but not a topic you would use this mapping:

<style styleName="Title"
  level="0"
  structureType="mapTitle"
  tagName="title"
  mapFormat="bookmap"
  mapType="bookmap"
  prologType="topicmeta"
/>

The @structureType value of "mapTitle" triggers the general map generation. The other attributes define the details of the map markup: @mapType and @prologType. The value "0" (zero) for the @level attribute indicates that this is the root output structure. You should only have one level-zero paragraph-to-map or paragraph-to-topic mapping.

If you map a paragraph to a structure type of "mapTitle" at a level other than zero then you will create a submap that is referenced from the parent map (that is, the map that is one level up in the mapping hierarchy). In this case you must specify the @maprefType attribute in order to get a topicref generated in the parent map.

When a paragraph maps to a map and a topic you specify a structure type of "mapTitle" and specify the topic type using the @secondStructureType attribute. You can then specify the other attributes used to generate a result topic as you would for a structure type of "mapTitle":

<style
  styleName="Topic Title"
  bodyType="learningContentbody"
  chunk="to-content"
  mapFormat="learningMap"
  format="learningContent"
  initialSectionType="section"
  level="1"
  mapType="map"
  prologType="prolog"
  secondStructureType="topicTitle"
  structureType="mapTitle"
  tagName="title"
  topicDoc="yes"
  topicType="learningContent"
  topicrefType="learningContentRef"
/>

This example maps a paragraph named "Topic Title" to a generic map (@mapType of "map") and a learningContent topic referenced from the map. The value of "1" for the @level attribute means this map will be a submap to the root map.

There are attributes for specifying the element types for each different kind of element that might result from generating both a map and topic.

When a new document should be referenced from the current map simply specify the @topicrefType attribute and a structure type of "topicTitle".

Mapping Metadata

To map paragraphs to <data> elements within the root topic prolog, you must specify:

@topicZone as "prolog"
@baseClass as "- topic/data " (note the trailing blank)
@containingTopic as "root"
@level as "0"

To map to unspecialized <data> you would specify @tagName as "data". You can put the paragraph content either into the content of the <data> element or into the @value attribute. To put it in content, set @putValueIn to "content", otherwise set it to "value".

Because the DITA content model for <prolog> is a bit fiddly, with a required sequence of element types, you may need to apply post-processing in the "final-fixup" mode to make sure the resulting prolog content is valid.

Mapping Sidebars

Sidebars are a particular problematic element because they often have no fixed location in the overall hierarchy. That is, they are often allowed (by publishers' business rules) to occur at many different levels (perhaps inside Heading 1, Heading 2, Heading 3, and Heading 4 sections). This used to require specific Word styles for each hierarchical level that the sidebar could appear at (e.g., Sidebar Title in Heading 1, Sidebar Title in Heading 2). That increased work and potential for error for maintainers of the style-to-tag mapping (because they had to make sure they had appropriate definitions for each level a sidebar was allowed at) and for authors (because they had to use the right sidebar style for whatever level the sidebar was to appear at. These obstacles have been overcome starting in 0.9.19 RC 11.

FIXME: Rewrite to reflect new relative level values.

With 0.9.19 RC 11, there is further processing that recalculates the level for individual instances of topic-generating styles. Because the recalculation respects the relationships between levels as established in the style-to-tag mapping, authors can place sidebars where they want and mapping maintainers only need a single sidebar title style definition (assuming they set the relationships correctly). The easiest (perhaps best) way to set the level for sidebars is to give them a really deep level, such as 50. In the recalculation, each instance of the sidebar title will be assigned a level that nests the sidebar within the parent topic.

For example, to map the paragraph "Sidebar Title" to a sidebar topic you would use this mapping:

<style styleName="Sidebar Title"
  structureType="topicTitle"
  tagName="title"
  topicType="sidebar"
  bodyType="body"
  prologType="prolog"
  level="50"
/>

For information on other attributes, see the section Heading Paragraphs That Map to Topics.