Common Style-to-Tag Mapping Cases
Describes how to map styles to tags for common cases
This section describes how to set up the mapping for different common cases.
- For paragraphs that map to topic components you must specify the
@topicZone
attribute as this attribute is used to group incoming paragraphs for further processing. If the@topicZone
attribute is not specified a default of "body" is used. This is usually what you want but would be wrong if the paragraph should have gone in the map or topic prolog. - Paragraphs that contribute to the root topic or map metadata can go anywhere in the Word document because they go into a separate topic zone and therefore get plucked out of the input and put in the right place regardless of where they occur.
- The
@level
attribute is what determines the hierarchical relationship among elements within the same output context, such as within the topic body. - You can specify either the style display name, using
@styleName
or the style ID, using@styleId
. The style name is the display name that is used in the various Word style-related user interfaces. The style ID is the value specified in the underlying markup for paragraphs and character ranges. If you specify both style name and style ID on a mapping, the style name takes precedence. In general, the style name is more reliable as Word may change the style ID, for example, when using locale-specific versions of Word. - The "style ID" for Word styles is not the same as the display name of the style, but it is usually very similar. In most cases the style ID is the display name with all spaces and special character removed, e.g., the style with the display name "Heading 1" has the style ID "Heading1". If you happen to define two style names that differ only in their use of spaces, e.g., "Heading 1" and "Heading1", then the style ID of the second style defined will be something like "Heading10" so that the style ID is unique within the Word document. If you're not sure what the style ID is you can open the document.xml file from the DOCX Zip package and look for
<w:pStyle>
elements—the value of the@w:val
attribute is the style ID. - The
@tagName
attribute always specifies the name of the primary (most deeply nested) result element. The primary result may be surrounded by other elements as specified with other attributes, such as@containerType
. The element named by@tagName
is the element that will contain the text of the paragraph or character run being mapped, except in a few special cases, such as mapping to section elements. - The
@outputclass
attribute is helpful for applying further XSLT processing (particularly in finding appropriate elements) and for applying additional styling in an output (like in HTML where the@outputclass
attribute becomes part of the generated class attribute that can be used with CSS). - A paragraph that generates a topic may also generate a map or submap and in that case will also generate a topicref to the topic (and a mapref to the submap if a submap is generated). This lets you generate trees of maps in addition to trees of topics. You can choose to generate a hierarchy of topics as separate topic documents with nested topicrefs or as a single XML document with a root topic and nested topics.
Finally, the design of the current style-to-tag mapping document evolved somewhat organically based on client requirements as they came up. There are aspects of it that are not necessarily logically consistent. I have started the process of designing a new style-to-tag mapping document type but it's on the back burner. But I would definitely welcome any suggestions for how to make the mapping markup clearer or easier to use.
Simple Paragraphs and Character Runs
<style styleName="Normal" tagName="p" topicZone="body" level="1" />
Where the parts in italics are what you would change to match a specific Word paragraph style to a specific DITA output element.
This example maps a paragraph with the name "Normal" to the DITA element <p>
.
<style styleName="Heading 1 Char" tagName="ph" topicZone="body" level="1" />
Note that as of version 0.9.16 there is no support for nesting character styles in the output, so if you need something like "bold italics" which you would typically markup like <b><i> in DITA you will need to define a single phrase-level element like <bi>
. Note that the DITA for Publishers formatting domain includes elements for common combinations of bold, italic, and smallcaps, as well as other elements you might need. You can also use the transform's "final fixup" mode to convert single elements like <bi>
into nested elements if you want.
Heading Paragraphs That Map to Topics
To map heading paragraphs to topics you specify the topic type and the markup details for the generated topic. The heading paragraph becomes the topic title and the other topic elements are generated automatically.
<style styleName="Heading 1" structureType="topicTitle" tagName="title" topicType="concept" bodyType="conbody" level="1" topicDoc="yes" format="concept" topicrefType="chapter" />
The structure type "topicTitle" indicates that this paragraph acts as a topic title, which means it will generate a new topic, either in a separate document or as a subtopic of a parent topic. In this case the value of the @level
attribute ("1") indicates that this is a top-level topic.
The @tagName
attribute defines the tagname to use for the topic's title element, "title" in this case (it would only be different from "title" if you were generating a specialized topic with a specialized topic title element).
The @topicType
attribute specifies the tagname for the root element of the topic to be generated, "concept" in this case.
The @bodyType
attribute specifies the tagname for the topic body element, "conbody" in this case.
The @level
attribute indicates that this is a top-level topic and it would be subordinate only to a paragraph that specifies level zero, which is normally reserved for the paragraph that generates the root map or topic and should normally be the first unskipped paragraph in the Word document.
The @topicDoc
attribute indicates that this topic should be put in a new document, which in turn implies the generation of a topicref to the generated document. The default is "no" so you must specify @topicDoc
with a value of "yes" if you want the topic chunked out to a new file.
If you specify "yes" for @topicDoc
then you must specify @format
, which names an <output>
element defined in the style-to-tag mapping document. The format value determines the public and system IDs to use for the generated topic document. You must also specify the @topicrefType
attribute if process is generating a map in addition to topics.
If you specify "yes" for @topicDoc
and the topic is not the root topic (meaning the primary result topic) then the style mapping should also generate a map so that there is a place to put a topicref to the generated topic document.
The @topicrefType
attribute specifies the tagname to use for the topicref that will refer to the generated topic. In this case the topicref tagname is "chapter".
Mapping lower-level headings uses the same pattern, specifying the appropriate level, e.g., level "2" for paragraph style Heading 2 and so on.
Mapping List Paragraphs (Generating Container Elements)
Lists are typical of elements that must be generated within the context of container elements, e.g. <li>
within <ol>
or <ul>
.
<li>
and then specify the name of the container element to wrap all adjacent list items in, specified using the @containerType
attribute, like so:<style styleName="List Bullet" tagName="li" containerType="ul" level="1" topicZone="body" />
Here the tag name is "li", indicating that the paragraph content will be output in a <li>
element. The @containerType
attribute names the tagname of the container element for the list item, <ul>
in this case. The value "1" for the @level
attribute means that this is the first level of thing within the topic zone (body). Any elements to be nested inside the list item would need to specify level "2" (for example, paragraphs for 2nd-level list items).
The implication of @containerType
is that all adjacent Word paragraphs with the same container type will be output into a single instance of the specified container. This is how you can get a sequence of ListBullet paragraphs to output as a single DITA <ul>
element.
This pattern of @tagName
and @containerType
can be used for any paragraphs that need to be output inside a common container that are not creating DITA <section>
elements (which have special mapping support).
Mapping Definition Lists
Definition lists are a challenge because there is no single Word structure that maps directly to definition lists and DITA requires two-levels of containment: one for the definition list as a whole and one for each term/definition pair. To model definition lists you must use pairs of paragraphs: one for the definition term and one for the definition description.
Definition term paragraphs use the @structureType
value "dt" and definition description paragraphs use the @structureType
value "dd". For a pair of paragraph styles that represent a term/definition pair, they both specify the same value for the @dlEntryType
attribute, e,g., "dlentry". Finally, they both specify the same value for the @containerType
attribute, which specifies the overall definition list element, e.g. "dl".
<style styleName="Def Term" structureType="dt" tagName="dt" dlEntryType="dlentry" containerType="dl" topicZone="body" /> <style styleName="Def Desc" structureType="dd" tagName="dd" dlEntryType="dlentry" containerType="dl" topicZone="body" />
Note that the resulting markup doesn't have to be a specialization of <dl>
it just has to have the same structural pattern of two levels of containment with the lowest-level elements in common containers. You could use this mapping pattern to generate simple tables, for example.
Mapping Procedure Steps
Steps in procedures are similar to definition lists in that there are two layers of markup that wrap the paragraphs for a step: the <steps>
element that contains each <step>
and the <cmd>
element that is the required first subelement of <step>
.
@structureType
of "dt":<style styleName="List 1" structureType="dt" dlEntryType="step" containerType="steps" tagName="cmd" level="1" />
Here the paragraph style "List1" is mapped to <cmd>
within <step>
within <steps>
.
<info>
element. For this case you would use normal mapping and specify level 2:<style styleName="BodyText Step" containerType="info" tagName="p" level="2" structureType="block" topicZone="body" />
<info>
element then you use a definition list mapping like so:<style styleName="Bullet 2 Step" containerType="info" dlEntryType="ul" tagName="li" level="2" structureType="dt" topicZone="body" />
Because the container type for both BodyText Step and Bullet 2 Step is "info" in this example both paragraph types will end up contributing to the same info element in the output if they occur together.
Mapping Tables
Word tables are automatically converted to DITA tables. Paragraph and character styles within table cells will be mapped as defined in the mapping but you don't have direct control over how the table elements map to DITA markup (but you can always post-process the initial DITA into whatever you want).
The transform captures as much detail from the Word table as can be expressed using DITA tables, including the precise table geometry, row and column spans, row and cell borders, and horizontal and vertical cell alignment.
Skipping Paragraphs
It's often useful or necessary to have paragraphs in the Word document that shouldn't be reflected in the DITA output. For this case you can use a @structureType
of "skip". Paragraphs with a structure type of "skip" are ignored and have no effect on output. In particular, they do not affect the determination of what paragraphs are adjacent for @containerType
processing.
In addition to explicitly-skipped paragraphs, paragraphs that are not otherwise mapped and that have empty content (that is, they normalize to a single blank or to the empty string) are automatically skipped. This saves you having worry about users creating empty paragraphs to get vertical spacing in Word documents.
Mapping to Sections
DITA sections are challenging because they have three ways to represent the title: a literal title child element, an explicit @spectitle
attribute, or a @spectitle
set in the document type and not intended to be set by authors. In addition, some topic types require the use of sections within the topic body. This all leads to a bit of complexity.
@structureType
of section and a @tagName
of "title" (or whatever the section title element should be, but usually "title"):<style styleName="My Section Title" structureType="section" tagName="title" sectionType="section" topicZone="body" useAsTitle="yes" />
Note that you don't have to worry about the @level
attribute with sections because DITA sections cannot nest and must be direct children of the topic body. So the conversion processing will always do the right thing for sections.
Note, however, that if you have any sections within your topic body then all the paragraphs that follow the first section-creating paragraph will be in a section because there's no way to indicate that a given paragraph should not be in a section. However, this shouldn't be a problem in practice because it would be a very rare markup design that expected there to be a random mix of section elements and non-section elements within the topic body.
Any paragraphs that precede the first section-creating paragraph within a topic body will be direct children of the topic body unless you specify the @initialSectionType
attribute for the paragraph that generates the containing topic.
For example, the Learning and Training learningContent topic type requires the use of section-type elements within the topic body. To handle this case you can specify the @initialSectionType
attribute to indicate that the initial paragraphs of the topic body should be wrapped in a section.
<style styleName="Lesson Title" structureType="topicTitle" initialSectionType="lcIntro" topicType="learningContent" bodyType="learningContentbody" level="1" />
<learningContent id="topicid"> <title>Learning Content</title> <shortdesc>Put a shortdesc of one or two sentences here.</shortdesc> <learningContentbody> <lcIntro> ... (paragraphs go here) </lcIntro> </learningContentbody> </learningContent>
All paragraphs up to the first paragraph that maps to section would go within the <lcIntro>
element in this example.
Usage: Controls the construction of the section title.
Could provide the spec title of "Usage" and the initial paragraph of "Controls the construction of the section title.".
<style styleName="My Section" structureType="section" tagName="p" topicZone="body" spectitle="#toColon" />
Here the keyword value "#toColon" for the @spectitle
attribute indicates that the spectitle value should be taken from the paragraph content up to, but not including, the first colon. (Other values could be implemented but as of version 0.9.6 the only implemented value is "#toColon".) The value "p" for the @tagName
attribute indicates that the paragraph will be the first paragraph of the section rather than the title.
You can also specify a literal value for @spectitle
, which simply becomes the value of the @spectitle
attribute in the generated DITA XML.
Mapping to Maps and Map Components (Topicrefs)
The simplest mapping is one in which a single Word document maps to a single result topic document. However, you can generate systems of maps in addition to topics from a single input Word file. However, this gets a little complex because there's a lot going on.
When you are generating a map and topics you must map the first non-skipped paragraph in the Word document to a map element, which will generate the root map of the output structure. It can also map to a topic (and by implication, a topicref to that topic) but it need not.
<style styleName="Title" level="0" structureType="mapTitle" tagName="title" mapFormat="bookmap" mapType="bookmap" prologType="topicmeta" />
The @structureType
value of "mapTitle" triggers the general map generation. The other attributes define the details of the map markup: @mapType
and @prologType
. The value "0" (zero) for the @level
attribute indicates that this is the root output structure. You should only have one level-zero paragraph-to-map or paragraph-to-topic mapping.
If you map a paragraph to a structure type of "mapTitle" at a level other than zero then you will create a submap that is referenced from the parent map (that is, the map that is one level up in the mapping hierarchy). In this case you must specify the @maprefType
attribute in order to get a topicref generated in the parent map.
@secondStructureType
attribute. You can then specify the other attributes used to generate a result topic as you would for a structure type of "mapTitle":<style styleName="Topic Title" bodyType="learningContentbody" chunk="to-content" mapFormat="learningMap" format="learningContent" initialSectionType="section" level="1" mapType="map" prologType="prolog" secondStructureType="topicTitle" structureType="mapTitle" tagName="title" topicDoc="yes" topicType="learningContent" topicrefType="learningContentRef" />
This example maps a paragraph named "Topic Title" to a generic map (@mapType
of "map") and a learningContent topic referenced from the map. The value of "1" for the @level
attribute means this map will be a submap to the root map.
There are attributes for specifying the element types for each different kind of element that might result from generating both a map and topic.
When a new document should be referenced from the current map simply specify the @topicrefType
attribute and a structure type of "topicTitle".
Mapping Metadata
<data>
elements within the root
topic prolog, you must specify:@topicZone
as "prolog"@baseClass
as "- topic/data " (note the trailing blank)@containingTopic
as "root"@level
as "0"
To map to unspecialized <data>
you would specify
@tagName
as "data". You can put the paragraph content
either into the content of the <data>
element or into
the @value
attribute. To put it in content, set
@putValueIn
to "content", otherwise set it to
"value".
Because the DITA content model for <prolog>
is a bit
fiddly, with a required sequence of element types, you may need to apply
post-processing in the "final-fixup" mode to make sure the resulting
prolog content is valid.
Mapping Sidebars
Sidebars are a particular problematic element because they often have no fixed location in the overall hierarchy. That is, they are often allowed (by publishers' business rules) to occur at many different levels (perhaps inside Heading 1, Heading 2, Heading 3, and Heading 4 sections). This used to require specific Word styles for each hierarchical level that the sidebar could appear at (e.g., Sidebar Title in Heading 1, Sidebar Title in Heading 2). That increased work and potential for error for maintainers of the style-to-tag mapping (because they had to make sure they had appropriate definitions for each level a sidebar was allowed at) and for authors (because they had to use the right sidebar style for whatever level the sidebar was to appear at. These obstacles have been overcome starting in 0.9.19 RC 11.
FIXME: Rewrite to reflect new relative level values.
With 0.9.19 RC 11, there is further processing that recalculates the level for individual instances of topic-generating styles. Because the recalculation respects the relationships between levels as established in the style-to-tag mapping, authors can place sidebars where they want and mapping maintainers only need a single sidebar title style definition (assuming they set the relationships correctly). The easiest (perhaps best) way to set the level for sidebars is to give them a really deep level, such as 50. In the recalculation, each instance of the sidebar title will be assigned a level that nests the sidebar within the parent topic.
<style styleName="Sidebar Title" structureType="topicTitle" tagName="title" topicType="sidebar" bodyType="body" prologType="prolog" level="50" />
For information on other attributes, see the section Heading Paragraphs That Map to Topics.