Common Style-to-Tag Mapping Cases
Describes how to map styles to tags for common cases
This section describes how to set up the mapping for different common cases.
- For paragraphs that map to topic components you must specify the
@topicZoneattribute as this attribute is used to group incoming paragraphs for further processing. If the
@topicZoneattribute is not specified a default of "body" is used. This is usually what you want but would be wrong if the paragraph should have gone in the map or topic prolog.
- Paragraphs that contribute to the root topic or map metadata can go anywhere in the Word document because they go into a separate topic zone and therefore get plucked out of the input and put in the right place regardless of where they occur.
@levelattribute is what determines the hierarchical relationship among elements within the same output context, such as within the topic body.
- You can specify either the style display name, using
@styleNameor the style ID, using
@styleId. The style name is the display name that is used in the various Word style-related user interfaces. The style ID is the value specified in the underlying markup for paragraphs and character ranges. If you specify both style name and style ID on a mapping, the style name takes precedence. In general, the style name is more reliable as Word may change the style ID, for example, when using locale-specific versions of Word.
- The "style ID" for Word styles is not the same as the display name of the style, but it is usually very similar. In most cases the style ID is the display name with all spaces and special character removed, e.g., the style with the display name "Heading 1" has the style ID "Heading1". If you happen to define two style names that differ only in their use of spaces, e.g., "Heading 1" and "Heading1", then the style ID of the second style defined will be something like "Heading10" so that the style ID is unique within the Word document. If you're not sure what the style ID is you can open the document.xml file from the DOCX Zip package and look for
<w:pStyle>elements—the value of the
@w:valattribute is the style ID.
@tagNameattribute always specifies the name of the primary (most deeply nested) result element. The primary result may be surrounded by other elements as specified with other attributes, such as
@containerType. The element named by
@tagNameis the element that will contain the text of the paragraph or character run being mapped, except in a few special cases, such as mapping to section elements.
@outputclassattribute is helpful for applying further XSLT processing (particularly in finding appropriate elements) and for applying additional styling in an output (like in HTML where the
@outputclassattribute becomes part of the generated class attribute that can be used with CSS).
- A paragraph that generates a topic may also generate a map or submap and in that case will also generate a topicref to the topic (and a mapref to the submap if a submap is generated). This lets you generate trees of maps in addition to trees of topics. You can choose to generate a hierarchy of topics as separate topic documents with nested topicrefs or as a single XML document with a root topic and nested topics.
Finally, the design of the current style-to-tag mapping document evolved somewhat organically based on client requirements as they came up. There are aspects of it that are not necessarily logically consistent. I have started the process of designing a new style-to-tag mapping document type but it's on the back burner. But I would definitely welcome any suggestions for how to make the mapping markup clearer or easier to use.
Simple Paragraphs and Character Runs
<paragraphStyle styleName="Normal" tagName="p" topicZone="body" level="1"/>
Where the parts in italics are what you would change to match a specific Word paragraph style to a specific DITA output element.
This example maps a paragraph with the style name "Normal" to the DITA element
<characterStyle styleName="Heading 1 Char" tagName="ph" topicZone="body" level="1" />
You can specify
<characterStyle> in order to generate two levels of containment, e.g.,
<b>. You can also use the transform's "final fixup" mode to convert single elements like
<bi> into nested elements if you want. Note that the DITA for Publishers formatting domain includes elements for common combinations of bold, italic, and smallcaps, as well as other elements you might need.
Heading Paragraphs That Map to Topics
To map heading paragraphs to topics you specify the topic type and the markup details for the generated topic. The heading paragraph becomes the topic title and the other topic elements are generated automatically.
<paragraphStyle styleName="Heading 1" structureType="topicTitle" tagName="title" level="1"> <topicrefProperties topicrefType="chapter"/> <topicProperties topicType="concept" bodyType="conbody" topicDoc="yes" format="concept" /> </paragraphStyle>
The structure type "topicTitle" indicates that this paragraph acts as a topic title, which means it will generate a new topic, either in a separate document or as a subtopic of a parent topic. In this case the value of the
@level attribute ("1") indicates that this is a top-level topic.
@tagName attribute defines the tagname to use for the topic's title element, "title" in this case (it would only be different from "title" if you were generating a specialized topic with a specialized topic title element).
@topicType attribute specifies the tagname for the root element of the topic to be generated, "concept" in this case.
@bodyType attribute specifies the tagname for the topic body element, "conbody" in this case.
@level attribute indicates that this is a top-level topic and it would be subordinate only to a paragraph that specifies level zero, which is normally reserved for the paragraph that generates the root map or topic and should normally be the first unskipped paragraph in the Word document.
@topicDoc attribute indicates that this topic should be put in a new document, which in turn implies the generation of a topicref to the generated document. The default is "no" so you must specify
@topicDoc with a value of "yes" if you want the topic chunked out to a new file.
If you specify "yes" for
@topicDoc then you must specify
@format, which names an
<output> element defined in the style-to-tag mapping document. The format value determines the public and system IDs to use for the generated topic document. You must also specify the
@topicrefType attribute if process is generating a map in addition to topics.
If you specify "yes" for
@topicDoc and the topic is not the root topic (meaning the primary result topic) then the style mapping should also generate a map so that there is a place to put a topicref to the generated topic document.
@topicrefType attribute specifies the tagname to use for the topicref that will refer to the generated topic. In this case the topicref tagname is "chapter".
Mapping lower-level headings uses the same pattern, specifying the appropriate level, e.g., level "2" for paragraph style Heading 2 and so on.
Mapping List Paragraphs (Generating Container Elements)
Lists are typical of elements that must be generated within the context of container elements, e.g.
<li>and then specify the name of the container element to wrap all adjacent list items in, specified using the
@containerTypeattribute, like so:
<paragraphStyle styleName="List Bullet" tagName="li" containerType="ul" level="1" topicZone="body" />
Here the tag name is "li", indicating that the paragraph content will be output in a
<li> element. The
@containerType attribute names the tagname of the container element for the list item,
<ul> in this case. The value "1" for the
@level attribute means that this is the first level of thing within the topic zone (body). Any elements to be nested inside the list item would need to specify level "2" (for example, paragraphs for 2nd-level list items).
The implication of
@containerType is that all adjacent Word paragraphs with the same container type will be output into a single instance of the specified container. This is how you can get a sequence of ListBullet paragraphs to output as a single DITA
This pattern of
@containerType can be used for any paragraphs that need to be output inside a common container that are not creating DITA
<section> elements (which have special mapping support).
Mapping Definition Lists
Definition lists are a challenge because there is no single Word structure that maps directly to definition lists and DITA requires two-levels of containment: one for the definition list as a whole and one for each term/definition pair. To model definition lists you must use pairs of paragraphs: one for the definition term and one for the definition description.
Definition term paragraphs use the
@structureType value "dt" and definition description paragraphs use the
@structureType value "dd". For a pair of paragraph styles that represent a term/definition pair, they both specify the same value for the
@dlEntryType attribute, e,g., "dlentry". Finally, they both specify the same value for the
@containerType attribute, which specifies the overall definition list element, e.g. "dl".
<paragraphStyle styleName="Def Term" structureType="dt" tagName="dt" dlEntryType="dlentry" containerType="dl" topicZone="body" /> <paragraphStyle styleName="Def Desc" structureType="dd" tagName="dd" dlEntryType="dlentry" containerType="dl" topicZone="body" />
Note that the resulting markup doesn't have to be a specialization of
<dl> it just has to have the same structural pattern of two levels of containment with the lowest-level elements in common containers. You could use this mapping pattern to generate simple tables, for example.
Mapping Procedure Steps
Steps in procedures are similar to definition lists in that there are two layers of markup that wrap the paragraphs for a step: the
<steps> element that contains each
<step> and the
<cmd> element that is the required first subelement of
<paragraphStyle styleName="List 1" structureType="dt" dlEntryType="step" containerType="steps" tagName="cmd" level="1" />
Here the paragraph style "List1" is mapped to
<info>element. For this case you would use normal mapping and specify level 2:
<paragraphStyle styleName="BodyText Step" containerType="info" tagName="p" level="2" structureType="block" topicZone="body" />
<info>element then you use a definition list mapping like so:
<paragraphStyle styleName="Bullet 2 Step" containerType="info" dlEntryType="ul" tagName="li" level="2" structureType="dt" topicZone="body" />
Because the container type for both BodyText Step and Bullet 2 Step is "info" in this example both paragraph types will end up contributing to the same info element in the output if they occur together.
Word tables are automatically converted to DITA tables. Paragraph and character styles within table cells will be mapped as defined in the mapping but you don't have direct control over how the table elements map to DITA markup (but you can always post-process the initial DITA into whatever you want).
The transform captures as much detail from the Word table as can be expressed using DITA tables, including the precise table geometry, row and column spans, row and cell borders, and horizontal and vertical cell alignment.
It's often useful or necessary to have paragraphs in the Word document that shouldn't be reflected in the DITA output. For this case you can use a
@structureType of "skip". Paragraphs with a structure type of "skip" are ignored and have no effect on output. In particular, they do not affect the determination of what paragraphs are adjacent for
In addition to explicitly-skipped paragraphs, paragraphs that are not otherwise mapped and that have empty content (that is, they normalize to a single blank or to the empty string) are automatically skipped. This saves you having worry about users creating empty paragraphs to get vertical spacing in Word documents.
Mapping to Sections
DITA sections are challenging because they have three ways to represent the title: a literal title child element, an explicit
@spectitle attribute, or a
@spectitle set in the document type and not intended to be set by authors. In addition, some topic types require the use of sections within the topic body. This all leads to a bit of complexity.
@structureTypeof section and a
@tagNameof "title" (or whatever the section title element should be, but usually "title"):
<paragraphStyle styleName="My Section Title" structureType="section" tagName="title" sectionType="section" topicZone="body" useAsTitle="yes" />
Note that you don't have to worry about the
@level attribute with sections because DITA sections cannot nest and must be direct children of the topic body. So the conversion processing will always do the right thing for sections.
Note, however, that if you have any sections within your topic body then all the paragraphs that follow the first section-creating paragraph will be in a section because there's no way to indicate that a given paragraph should not be in a section. However, this shouldn't be a problem in practice because it would be a very rare markup design that expected there to be a random mix of section elements and non-section elements within the topic body.
Any paragraphs that precede the first section-creating paragraph within a topic body will be direct children of the topic body unless you specify the
@initialSectionType attribute for the paragraph that generates the containing topic.
For example, the Learning and Training learningContent topic type requires the use of section-type elements within the topic body. To handle this case you can specify the
@initialSectionType attribute to indicate that the initial paragraphs of the topic body should be wrapped in a section.
<paragraphStyle styleName="Lesson Title" structureType="topicTitle" level="1"> <topicProperties initialSectionType="lcIntro" topicType="learningContent" bodyType="learningContentbody" /> </paragraphStyle>
<learningContent id="topicid"> <title>Learning Content</title> <shortdesc>Put a shortdesc of one or two sentences here.</shortdesc> <learningContentbody> <lcIntro> ... (paragraphs go here) </lcIntro> </learningContentbody> </learningContent>
All paragraphs up to the first paragraph that maps to section would go within the
<lcIntro> element in this example.
Usage: Controls the construction of the section title.
Could provide the spec title of "Usage" and the initial paragraph of "Controls the construction of the section title.".
<paragraphStyle styleName="My Section" structureType="section" tagName="p" topicZone="body" spectitle="#toColon" />
Here the keyword value "#toColon" for the
@spectitle attribute indicates that the spectitle value should be taken from the paragraph content up to, but not including, the first colon. (Other values could be implemented but as of version 0.9.6 the only implemented value is "#toColon".) The value "p" for the
@tagName attribute indicates that the paragraph will be the first paragraph of the section rather than the title.
You can also specify a literal value for
@spectitle, which simply becomes the value of the
@spectitle attribute in the generated DITA XML.
Mapping to Maps and Map Components (Topicrefs)
The simplest mapping is one in which a single Word document maps to a single result topic document. However, you can generate systems of maps in addition to topics from a single input Word file. However, this gets a little complex because there's a lot going on.
When you are generating a map and topics you must map the first non-skipped paragraph in the Word document to a map element, which will generate the root map of the output structure. It can also map to a topic (and by implication, a topicref to that topic) but it need not.
<paragraphStyle styleName="Title" level="0" structureType="mapTitle" tagName="title"> <mapProperties tagName="title" format="bookmap" prologType="topicmeta"/> </paragraphStyle>
@structureType value of "mapTitle" triggers the general map generation. The other attributes define the details of the map markup:
@prologType. The value "0" (zero) for the
@level attribute indicates that this is the root output structure. You should only have one level-zero paragraph-to-map or paragraph-to-topic mapping.
If you map a paragraph to a structure type of "mapTitle" at a level other than zero then you will create a submap that is referenced from the parent map (that is, the map that is one level up in the mapping hierarchy). In this case you must specify the
@maprefType attribute in order to get a topicref generated in the parent map.
@secondStructureTypeattribute. You can then specify the other attributes used to generate a result topic as you would for a structure type of "mapTitle":
<paragraphStyle styleName="Topic Title" level="1" secondStructureType="topicTitle" structureType="mapTitle" tagName="title"> <mapProperties format="learningContent" prologType="prolog" tagName="title" /> <topicrefProperties chunk="to-content" topicrefType="learningContentRef" /> <topicProperties bodyType="learningContentbody" initialSectionType="section" topicDoc="yes" topicType="learningContent" /> </paragraphStyle>
This example maps a paragraph named "Topic Title" to a generic map (
@mapType of "map") and a learningContent topic referenced from the map. The value of "1" for the
@level attribute means this map will be a submap to the root map.
There are attributes for specifying the element types for each different kind of element that might result from generating both a map and topic.
When a new document should be referenced from the current map simply specify the
@topicrefType attribute and a structure type of "topicTitle".
<data>elements within the root topic prolog, you must specify:
@baseClassas "- topic/data " (note the trailing blank)
To map to unspecialized
<data> you would specify
@tagName as "data". You can put the paragraph content either into the content of the
<data> element or into the
@value attribute. To put it in content, set
@putValueIn to "content", otherwise set it to "value".
Because the DITA content model for
<prolog> is a bit fiddly, with a required sequence of element types, you may need to apply post-processing in the "final-fixup" mode to make sure the resulting prolog content is valid.