DITA—a new standard for information architecture

In recent years many business communities have accepted eXtensible Markup Language (XML) as their preferred technology for recording, exchanging, and storing data. XML can also serve as a basis for creating and managing more complex information such as documentation. In theory, XML offers technical authors obvious advantages over their conventional authoring programs and techniques. When supported by suitable publication tools and processes, XML can:

  • Enable the inclusion of semantics and metadata that are specific to a particular organization
  • Expand the concept of single sourcing beyond the limits imposed by other technologies
  • Facilitate the reuse of information between publications and between organizations or communities
  • Facilitate the automatic customization of documentation to meet the needs of a specific audience or to reflect a particular situation

In practice, relatively few technical authors construct their information in XML. Those that do, generally use a small number of generic DTDs such as DocBook or xHTML because the creation and support of customized DTDs is expensive. These generic DTDs are also popular because they allow authors to continue using familiar design and implementation techniques. However, several emerging applications look likely to change the way authors think about and use XML. The most interesting of these new developments is known as DITA or the Darwin Information Typing Architecture.

Key components
DITA is a set of document-design principles represented as a collection of XML DTDs and other supporting files. A key feature of DITA design is that it focuses on the creation of small modules of information: topics in DITA terminology. A single topic should be meaningful and useful in the absence of other information. This topic-focused approach is significantly different from a standard such as DocBook, where the fundamental information unit is a complete book.

The use of topic-centred design has various consequences for the resulting information:

  • Small, self-contained modules of information are more reusable than larger units.
  • Modular information is well-suited to organizing technical documentation because technology users often need only a small amount of very specific information to solve individual problems.
  • Modular information is ideal for web sites, online help systems or in any media where users access information items randomly or in unpredictable sequences.
    Small modules are suitable for display on devices with small screens, such as mobile phones or PDAs.
  • The small information units can easily be handled by content-management workflows.
  • The are usually cost benefits if translation or localization is required subsequently.

The DITA architecture provides the DTDs for a general base topic and three, more specialized variations: a concept topic, a task topic, and a reference topic. The idea that information can be classified as different 'information types' has been well established in the technical communication community for many years. DITA provides a set of elements that authors can use to add semantics and structure to any topic. Many of these elements, such as <p>, <ol>, and <dl>, are already familiar to authors with any experience of HTML or other markup languages.

Each topic DTD also declares additional elements that are restricted to only that particular information type. For example, the task topic DTD permits <step> and <choice> elements, which are absent from the other types of topic. Authors also have the option to incorporate specialized elements from several domains. A domain is a vocabulary of XML elements that are relevant for a particular business or subject area. For example, the programming domain provides element tags that help authors mark up programming syntax or code examples.

Although DITA authors create information as separate topics, they publish information by aggregating topics to produce information sets such as a web sites, help files, or books. DITA can combine topics through a mapping mechanism, with a DITA map describing the relationship between topics in a particular context. A map has many potential uses, depending on the publishing process and the output medium. A publishing tool could, for example, use a map to order topics in a particular sequence, build a table of contents or navigation map, or to create hyperlinks between topics.

All of the features described so far provide a sound foundation for XML-based publishing. However if DITA is to win widespread acceptance and adoption, it will probably be due to an additional feature: specialization.

Evolution through specialization
DITA started life as an internal IBM project back in the late 1990s. Initial research convinced IBM developers that it would be impossible to design a universal DTD that could cater for the current and future needs of all potential users. Yet if individual departments or businesses were free to develop DTDs and supporting tools to meet their own specific needs, the resulting diversity would increase costs and hinder the exchange of information between groups. So the IBM developers conceived DITA specialization as a way to let people customize DTDs without raising barriers to information interchange.

Specialization is a mechanism that lets you extend DITA information types and domains to suit your specific requirements. However, you must define your new topics or domains as refinements or specializations of existing ones. If you wish, you can then create additional information types or domains that extend your specialized topics or domains even further. Each time you specialize an existing component, you must add statements to your DITA files to identify the ancestry or evolutionary path of your specialization. In this way, the standard DITA DTDs provide a lingua franca that guarantees at least a minimum level of semantic structure is retained during any information reuse or interchange.

Towards a wider acceptance
Recently IBM has handed DITA over to OASIS (the Organization for the Advancement of Structured Information Standards), which now oversees further development and promotes DITA as an open standard. A number of other major enterprises, including Intel, Nokia, and Sun Microsystems, are supporting and contributing to DITA's ongoing evolution. As recognition of this growing support, new versions of publishing tools are beginning to offer features that assist the creation of DITA-compliant documentation.

At a recent seminar in the UK, David Schell, a senior IBM strategist, stated that his corporation has already used DITA to publish the equivalent of 1500 books. This statistic suggests that DITA is already sufficiently robust and flexible to handle large volumes of information in a commercial environment. In practice the DITA development kit available from OASIS does not currently support all the features that authors might want to include in their publications. Future upgrades should address the most significant weaknesses. For the moment however, any organization wishing to adopt DITA should probably anticipate having to spend development time on modifying the XSLT transforms provided in the kit.

At 3di we monitor new developments, such as DITA, to understand when and how they can offer benefits to our customers.

Click Here to see how 3di worked with Nokia to develop a common XML content architecture based on DITA.

Top of the page