<?xml version="1.0" encoding="ISO-8859-1"?>

<?xml-stylesheet href="Gradu.css" type="text/css"?>
<?xml-stylesheet href="toc.css" type="text/css"?>

<!DOCTYPE Book [
<!ATTLIST Table
          Id ID #IMPLIED
>
<!ATTLIST Figure
          Id ID #IMPLIED
>
<!ATTLIST Chapter
          Id ID #IMPLIED
>
<!ATTLIST Sect1
          Id ID #IMPLIED
>
<!ATTLIST Sect2
          Id ID #IMPLIED
>
<!ATTLIST Sect3
          Id ID #IMPLIED
>
<!ATTLIST Abstract
          Id ID #IMPLIED
>
<!ATTLIST Preface
          Id ID #IMPLIED
>
<!ATTLIST Glossary
          Id ID #IMPLIED
>
<!ATTLIST Appendix
          Id ID #IMPLIED
>
<!ATTLIST Bibliography
          Id ID #IMPLIED
>
<!ATTLIST Para
          Id ID #IMPLIED
>
<!ATTLIST Title
          Id ID #IMPLIED
>
<!ATTLIST Example
          Id ID #IMPLIED
>
<!ATTLIST Footnote
          Id ID #IMPLIED
>
<!ATTLIST Entry
          Id ID #IMPLIED
>
<!ATTLIST BiblioEntry
          Id ID #IMPLIED
>
]>

<Book xmlns:xlink="http://www.w3.org/1999/xlink"
      xmlns:html="http://www.w3.org/1999/xhtml">

 <html:script src="toc.js"/>

 <Box>
   <Title>Table of Contents</Title>
   <html:input type="button" value="Toggle DOM ToC" onclick="createToc();"/>
 </Box>

<BookInfo><BookBiblio><Title>Product Document Management with SGML
And Relational Databases</Title>
<AuthorGroup><Author><Firstname>Heikki</Firstname> <Surname>Toivonen</Surname></Author></AuthorGroup>

<ProductName Class = "Trade">Master of Science Thesis</ProductName>
<PubDate>20.4.2000</PubDate>
<Publisher><PublisherName>
University of Jyväskylä</PublisherName><Address><OtherAddr>
Department of Mathematical Information Technology</OtherAddr></Address></Publisher>

<Abstract Lang = "fi"><Title>Tiivistelmä</Title>
<Para>Tietokannat ja rakenteiset dokumentit perustuvat niin erilaiseen teknologiaan
ja ajatteluun, että niiden yhteiskäyttö voi olla ongelmallista.
Edistystä on kuitenkin tapahtunut. Tämä tutkielma käsittelee
näiden kahden teknologian eroja sekä vaikeuksia teknologioiden
yhteiskäytössä. Pääpaino on tuotedokumentaation hallinnassa.
Käytännön osuudessa esitellään eräs sovellus, jossa tietokannat
ja rakenteiset dokumentit tukevat toisiaan.</Para>
<Para>Tekijä: Heikki Toivonen</Para>
<Para>Yhteystiedot: sähköposti <Literal MoreInfo = "None"><html:a html:href="mailto:hjtoi@jyu.fi">hjtoi@jyu.fi</html:a></Literal></Para>
<Para>Työn nimi: Tuotedokumentaation hallinta SGML:n ja relaatiotietokantojen
avulla</Para>
<Para>Avainsanat: SGML, XML, HyTime, rakenteiset dokumentit, dokumenttien
hallinta, tuotetiedon hallinta, tietokannat</Para></Abstract>
<Abstract Lang = "en"><Title>Abstract</Title>
<Para>Databases and structured documents have been used apart from each
other. The situation has changed dramatically over the past few
years. This thesis discusses differencies and difficulties in making
the two separate realms interact. The emphasis is on product document
management. The practical part of the thesis shows one implementation
where databases and structured documents work together.</Para>
<Para>Author: Heikki Toivonen</Para>
<Para>Contact: Email <Literal MoreInfo = "None"><html:a html:href="mailto:hjtoi@jyu.fi">hjtoi@jyu.fi</html:a></Literal></Para>
<Para>Title: Product Document Management with SGML and Relational Databases</Para>
<Para>Keywords: SGML, XML, HyTime, structured documents, document
management, product data management, databases</Para>
<Para></Para></Abstract>

<Preface><Title>Acknowledgements</Title>
<Para>This thesis was years in the making. Partly the reason was
that I got tired of writing it, but also because my job was taking
too much time. In a way that was a good thing, because it enabled
me to gain more experience and make this a better paper. It was
also interesting to see new standards and programs emerging. I was
also able to look back with a better hindsight of what should have been
done differently, and I assume I took a more critical view of my
own work.</Para>
<Para>I would like to thank the various persons who have inspired
me on my way of learning SGML and helped either directly or indirectly
with this thesis: the authors of Mosaic, for introducing me to structured
document principles with the birth of the World Wide Web; Professor
Heather Brown from the University of Kent at Canterbury for teaching
me SGML and supervising my first SGML project during my time at
Kent University; Ralf Petell and Björn Peltonen for offering me work
at CiTEC Engineering Oy; Joakim Östman for project management;
Leonard Norrgård for help with programming; Kaisa Miettinen for
getting gears rolling when they got stuck; Michael Leventhal for
language checking and other comments; Carl-Johann Måsala of Wärtsilä
for checking Wärtsilä facts; Jukka-Pekka Santanen and Pasi Koikkalainen,
my thesis supervisors at the University of Jyväskylä, and all
the rest of the supercool SGML gang at CiTEC for the enjoyable and
interesting times. Many people have written great books and other
publications, their names can be found in References. I must still
point out that all errors are mine.</Para>
<Para>The people at the 4th International HyTime Conference held
in Montreal, Canada, deserve a special mention. That was the first
SGML/HyTime meeting I attended, and the presentations and conversations
proved to be very inspiring and helpful to me. The atmosphere was
enthusiastic and friendly, even towards a newcomer like me. And
to actually meet and speak with creators of SGML and HyTime was
more than I could have ever dreamed of - talk about motivation!</Para>
<Para>Last but not least I would like to thank my wife Virpi for
her understanding and patience.</Para>
<Para><Author><Affiliation><Address><City>Vaasa</City></Address></Affiliation>
<Firstname>Heikki</Firstname><Surname>Toivonen</Surname></Author></Para></Preface></BookBiblio></BookInfo>

<Glossary Id="gloss.id"><Title>Terms And Acronyms</Title>
<Para>Quotes are from the HyTime standard <XRef Linkend = "BGBGFAHG"
    xlink:type="simple" xlink:href="#BGBGFAHG"
     >[ISO97]</XRef>.</Para>
<GlossEntry><GlossTerm>anchor</GlossTerm>
<GlossDef><Para>"An object (or a list of objects) that is linked
to other objects or lists of objects by a hyperlink." Object is
not a formal construct in HyTime, but it can mean a document, an
element in a document, a rectangular area in a frame in a video
sequence or just about anything.</Para></GlossDef></GlossEntry>
<GlossEntry><GlossTerm>attribute</GlossTerm>
<GlossDef><Para>SGML and XML <Emphasis>elements</Emphasis> may contain
attributes. Attributes contain <Emphasis>metadata</Emphasis> of
the <Emphasis>element</Emphasis>.</Para></GlossDef></GlossEntry>
<GlossEntry><GlossTerm>catalog</GlossTerm>
<GlossDef><Para>Catalog files map <Emphasis>public identifiers</Emphasis> to <Emphasis>system
identifiers</Emphasis>. They are plain text files.</Para></GlossDef></GlossEntry>
<GlossEntry><GlossTerm>contextual hyperlink</GlossTerm>
<GlossDef><Para>"A hyperlink that occurs 'in context', meaning
that one anchor of the link is the link element itself [...] and
is a traversal initiation anchor."</Para></GlossDef></GlossEntry>
<GlossEntry><GlossTerm>document type definition</GlossTerm>
<GlossDef><Para>SGML and XML document structure specification is
called document type definition (<Emphasis>DTD</Emphasis>), which
is the description of the structure of an <Emphasis>SGML</Emphasis> or <Emphasis>XML</Emphasis> document
written in a formal language.</Para></GlossDef></GlossEntry>
<GlossEntry><GlossTerm>DSN</GlossTerm>
<GlossDef><Para>Data Source Name is a concept from <Emphasis>ODBC</Emphasis>.
It is possible to define a DSN for a database and access the database
with that name, without knowing the actual location of the database.
This is possible because the ODBC layers take care of that information.</Para></GlossDef></GlossEntry>
<GlossEntry><GlossTerm>DTD</GlossTerm>
<GlossDef><Para>Abbreviation for <Emphasis>document type definition</Emphasis>.</Para></GlossDef></GlossEntry>
<GlossEntry><GlossTerm>element</GlossTerm>
<GlossDef><Para>SGML documents consist of elements that contain
other elements and text. Start <Emphasis>tags</Emphasis> begin elements
and end <Emphasis>tags</Emphasis> close elements.</Para></GlossDef></GlossEntry>
<GlossEntry><GlossTerm>entity</GlossTerm>
<GlossDef><Para>SGML has several kinds of entities. Parameter entities
are used inside a <Emphasis>document type definition</Emphasis> to
reuse DTD constructs. Internal entities can be used in DTDs and
document instances, and they can expand to text and <Emphasis>markup</Emphasis>.
External entities refer to external text, <Emphasis>markup</Emphasis> or
other objects like images.</Para></GlossDef></GlossEntry>
<GlossEntry><GlossTerm>grove</GlossTerm>
<GlossDef><Para>"Graph Representation Of property ValuEs." A
grove is the parse tree that a parser produces in memory. It also
contains some additional information, like links between the nodes
in the tree (or forest, which consists of multiple trees).</Para></GlossDef></GlossEntry>
<GlossEntry><GlossTerm>hyperlink</GlossTerm>
<GlossDef><Para>"An information structure that represents a relationship
among two or more objects."</Para></GlossDef></GlossEntry>
<GlossEntry><GlossTerm>hypertext</GlossTerm>
<GlossDef><Para>"Information that can be accessed in more than
one order."</Para></GlossDef></GlossEntry>
<GlossEntry><GlossTerm>HyTime</GlossTerm>
<GlossDef><Para>Hypermedia/Time-based Structuring Language. HyTime
is an international standard for representing multimedia documents,
including links. HyTime uses SGML constructs (all HyTime documents
are legal SGML documents, but one needs a HyTime-aware processor to
understand the HyTime semantics).</Para></GlossDef></GlossEntry>
<GlossEntry><GlossTerm>link	</GlossTerm>
<GlossDef><Para>For the purposes of this thesis a link is the same
as <Emphasis>hyperlink</Emphasis>.</Para></GlossDef></GlossEntry>
<GlossEntry><GlossTerm>markup</GlossTerm>
<GlossDef><Para>In the case of structured documents, the document
structure is specified with markup. The markup is part of the <Emphasis>metadata</Emphasis> of
the document.</Para></GlossDef></GlossEntry>
<GlossEntry><GlossTerm>metadata</GlossTerm>
<GlossDef><Para>Data about data, for example the creation date of
a file is metadata about the file.</Para></GlossDef></GlossEntry>
<GlossEntry><GlossTerm>micro-document</GlossTerm>
<GlossDef><Para>A document, often small. Usually micro-documents
are assembled to create complete manuals.</Para></GlossDef></GlossEntry>
<GlossEntry><GlossTerm>Multidoc Pro</GlossTerm>
<GlossDef><Para>An <Emphasis>SGML</Emphasis> browser by Citec Software
Ltd Oy.</Para></GlossDef></GlossEntry>
<GlossEntry><GlossTerm>navigator</GlossTerm>
<GlossDef><Para>A <Emphasis>Multidoc Pro</Emphasis> term for an
electronic table of contents.</Para></GlossDef></GlossEntry>
<GlossEntry><GlossTerm>notation</GlossTerm>
<GlossDef><Para><Emphasis>Entities</Emphasis> can refer to external
objects of different type. For example, images can be in GIF and
JPEG format. A notation can be used to define these types.</Para></GlossDef></GlossEntry>
<GlossEntry><GlossTerm>ODBC</GlossTerm>
<GlossDef><Para>Open Database Connectivity. A Microsoft standard
through which applications can access different databases. ODBC
drivers hide the actual differences between database implementations.</Para></GlossDef></GlossEntry>
<GlossEntry><GlossTerm>public identifier</GlossTerm>
<GlossDef><Para>Certain SGML and XML objects can be referred to
by public identifiers. These include <Emphasis>document type definitions</Emphasis>, <Emphasis>entities</Emphasis> and <Emphasis>notations</Emphasis>.</Para></GlossDef></GlossEntry>
<GlossEntry><GlossTerm>reference concrete syntax</GlossTerm>
<GlossDef><Para>The SGML standard defines a default SGML declaration
that is assumed if no explicit SGML declaration is given. This is
called the reference concrete syntax.</Para></GlossDef></GlossEntry>
<GlossEntry><GlossTerm>SGML</GlossTerm>
<GlossDef><Para>Standard Generalized Markup Language. SGML is an
international standard for structured documents. Explicit structure
in documents helps manage them and enables computer programs to
work intelligently with the structure.</Para></GlossDef></GlossEntry>
<GlossEntry><GlossTerm>SGML declaration</GlossTerm>
<GlossDef><Para>The first part of an SGML document that specifies
things like character encoding and the maximum length of names which
may be used in the document <Emphasis>markup</Emphasis>.</Para></GlossDef></GlossEntry>
<GlossEntry><GlossTerm>SGML (document) instance</GlossTerm>
<GlossDef><Para>The third major part of an SGML document, the "actual
document".</Para></GlossDef></GlossEntry>
<GlossEntry><GlossTerm>stylesheet</GlossTerm>
<GlossDef><Para>A stylesheet describes how a document should be
formatted. Structural documents usually separate the content of
the document and the style information into different files.</Para></GlossDef></GlossEntry>
<GlossEntry><GlossTerm>system identifier</GlossTerm>
<GlossDef><Para>A SGML <Emphasis>public identifier</Emphasis> is
first mapped to a system identifier, which is used by the system
to locate the physical object, for example a <Emphasis>document
type definition</Emphasis>. System identifir is usually a file name.</Para></GlossDef></GlossEntry>
<GlossEntry><GlossTerm>tag</GlossTerm>
<GlossDef><Para>The <Emphasis>markup</Emphasis> in the <Emphasis>SGML
document instance</Emphasis> consists of, among other things, tags
that mark the boundaries of <Emphasis>elements</Emphasis>. A start
tag begins an <Emphasis>element</Emphasis>. It is possible to give
values to the element's <Emphasis>attributes</Emphasis> inside
the start tag. The <Emphasis>SGML declaration</Emphasis> defines
what characters are used to begin and end tags. The <Emphasis>reference
concrete syntax</Emphasis> specifies that <Literal MoreInfo = "None">&lt;</Literal> opens
a start tag, <Literal MoreInfo = "None">&lt;/</Literal> opens an
end tag and <Literal MoreInfo = "None">&gt;</Literal> closes a tag.
This is a sample start tag: <Literal MoreInfo = "None">&lt;title
lang="en"&gt;</Literal>. End tags close elements, and they cannot
have attributes. The end tag for the start tag would look like this: <Literal
    MoreInfo = "None">&lt;/title&gt;</Literal>.</Para></GlossDef></GlossEntry>
<GlossEntry><GlossTerm>web</GlossTerm>
<GlossDef><Para>A web in <Emphasis>Multidoc Pro</Emphasis> term
which refers to a file that can be loaded over an existing SGML
document and which may contain user defined links, annotations and
bookmarks for that document.</Para></GlossDef></GlossEntry>
<GlossEntry><GlossTerm>XML</GlossTerm>
<GlossDef><Para>Extensible Markup Language. XML is a World Wide
Web Consortium Recommendation (effectively an Internet standard).
XML is a simplified subset of SGML.</Para></GlossDef></GlossEntry></Glossary>


<Chapter Id = "Intro.xref"><Title>Introduction</Title>
<BlockQuote><Para>Everything has structure. </Para>
<Para><Author><Firstname>Unknown</Firstname></Author></Para></BlockQuote>
<Para>The driving force for this paper was the Product Information
Management project started at Wärtsilä NSD Power Plants (in Vaasa,
Finland) to overcome problems in product documentation. The solutions
presented in this paper were developed mostly in 1996 and 1997.</Para>
<Para>This thesis describes how product documents in structural
format can be managed effectively with relational databases. The
structural document format this thesis deals with is Structured
Generalized Markup Language (<Acronym>SGML</Acronym>), which is
an International Standards Organization (<Acronym>ISO</Acronym>) standard <XRef
    Linkend = "BGBEIDAJ" xlink:type="simple"
    xlink:href="#BGBEIDAJ"  >[ISO86]</XRef>.</Para>
<Para><Acronym>SGML</Acronym> has also evolved for the World Wide
Web (<Acronym>WWW</Acronym>). This format of <Acronym>SGML</Acronym> is
known as the Extensible Markup Language (<Acronym>XML</Acronym>)
and it is a World Wide Web Consortium (<Acronym>W3C</Acronym>) Recommendation <XRef
    Linkend = "BGBBFHGF" xlink:type="simple"
    xlink:href="#BGBBFHGF"  >[W3C98]</XRef>.
Prior knowledge of <Acronym>SGML</Acronym> nor <Acronym>XML</Acronym> will
not be needed to understand this thesis as they will be explained
with sufficient details to understand this document. Relational
databases and Product Data Management (<Acronym>PDM</Acronym>) will
also be described briefly for the same reason. From the point of
view of this thesis <Acronym>SGML</Acronym> and <Acronym>XML</Acronym> are
nearly identical. Differences will be clearly pointed out where
they matter. This thesis generally refers to <Acronym>SGML</Acronym> as
the structural format, but generally <Acronym>XML</Acronym> could
be used equally well.</Para>
<Para>This thesis will show the importance of effective management
of product documents. The advantages of standardized, structured
format will be shown to be superior to traditional approaches. The
management of such documents is equally important. It will be explained
why relational databases, while not the most advanced technology
available today, are still good for the job. Moreover it will be
shown that it is feasible to retrieve information from both databases
and structured sources and combine the information so that it can
be shown as a structured document. Methods for using databases to
manage <Acronym>SGML</Acronym> and <Acronym>XML</Acronym> documents
and assemble large documents from document fragments will be discussed
as well.</Para>
<Para>This document is divided into two parts. The first part explains
the different standards and technologies used and discusses the
theory behind the practical part of this thesis. The first Part
is divided into five chapters. <XRef Linkend = "BGBIJJAH"
    xlink:type="simple" xlink:href="#BGBIJJAH" 
    >Chapter 2</XRef> describes product data management. <XRef
    Linkend = "CIHCCIJE" xlink:type="simple"
    xlink:href="#CIHCCIJE"  >Chapter
3</XRef> describes structured documents. <XRef Linkend = "CIHHCIFG"
    xlink:type="simple" xlink:href="#CIHHCIFG" 
    >Chapter 4</XRef> explains databases and the
last chapter in the first part discusses the theory of managing
documents with databases.</Para>
<Para>The second part is divided into four chapters and it describes
the SGML document management system at Wärtsilä NSD. <XRef
    Linkend = "CACDCHGA" xlink:type="simple"
    xlink:href="#CACDCHGA"  >Chapter
6</XRef> gives an overview of the system. <XRef Linkend = "CEGJCJIG"
    xlink:type="simple" xlink:href="#CEGJCJIG" 
    >Chapter 7</XRef> describes the document authoring
process and <XRef Linkend = "CEGHDCDG" xlink:type="simple"
    xlink:href="#CEGHDCDG"  >Chapter
8</XRef> how the different documents are managed and assembled into
larger units.</Para>
<Para>The implementation of the tools described in the second part
happened mostly in 1996 and 1997. Some software available at that
time has disappeared from the market while new products have appeared
in their place. Great advances in standards and other technologies
have occurred. The implemented technologies will be discussed in
light of new information.</Para>
<Para>Each chapter begins with a quote, often from a science-fiction
novel. The quote is somehow related to the chapter in question.
The actual relationship is left as an exercise to the reader.</Para>
<Para>Some images (screenshots particularly) are not of the highest
quality. That is because the original image was in some non-vector
format like GIF that suffers from scaling. Scaling was needed in many
places to make the images fit on paper.</Para>
<Para>This document itself is in structured format. The first drafts
were written with <ProductName Class = "Trade">Adept*Editor<Footnote>
<Para>Adept*Editor is a product of ArborText Inc.</Para></Footnote></ProductName>. Additional
work was done with various other <Acronym>SGML</Acronym> editors
including <ProductName Class = "Trade">FrameMaker+SGML<Footnote><Para>FrameMaker+SGML
is a product of Adobe Systems Inc.</Para></Footnote></ProductName>, which
was also used for printing and conversion to various other formats.
Occasionally text was edited with plaintext editors. The Document
Type Definition (<Acronym>DTD</Acronym>) was the <Literal
    MoreInfo = "None">Docbook</Literal> <Acronym>DTD</Acronym> <XRef
    Linkend = "BABJHEIE" xlink:type="simple"
    xlink:href="#BABJHEIE"  >[OAS99]</XRef>,
which was slightly modified for the needs of this thesis. The original
structured format was <Acronym>SGML</Acronym>, but simple transformations
were made to create an <Acronym>XML</Acronym> version as well.</Para>
<Para>There are <ProductName Class = "Trade">Synex ViewPort</ProductName><Footnote>
<Para>Synex ViewPort is a product of Synex Information AB, a fully
owned subsidiary of Enigma, Inc. See <Literal MoreInfo = "None">http://www.synex.se</Literal>.</Para></Footnote> stylesheets
as well as Cascading Style Sheets (<Acronym>CSS</Acronym>) <XRef
    Linkend = "BGBFHJDA" xlink:type="simple"
    xlink:href="#BGBFHJDA"  >[W3C96]</XRef> for
this document, which allows viewing of this document in any <Acronym>SGML</Acronym> or <Acronym>XML</Acronym>-capable
browser that supports either of the two stylesheet formats. Additionally,
PostScript and <Acronym>PDF</Acronym> versions of this thesis were
extracted for final printing. <Acronym>An HTML</Acronym> version
was also produced to allow viewing of this document with less advanced
Web browsers.</Para>
<Para>Some words about the SGML markup used in this document is
in order. The first occurrence of an important term is wrapped in <Literal
    MoreInfo = "None">FirstTerm</Literal> element and is formatted
in <FirstTerm>bold face</FirstTerm> on paper. Code and <Acronym>SGML</Acronym> markup
examples may be nested inside one of several different elements,
but it will always be formatted with fixed pitch <Literal
    MoreInfo = "None">Courier font</Literal>. The term <Literal
    MoreInfo = "None">element</Literal> will be explained later.
Most of the acronyms and terms are also explained in Terms and Acronyms.
Longer code and markup examples are numbered, as shown below:</Para>
<Example><Title>Markup Sample</Title>
<LiteralLayout xml:space = "preserve" Format = "linespecific">
&lt;tag&gt;Some SGML/XML sample&lt;/tag&gt;
</LiteralLayout></Example></Chapter>
<Chapter Id = "BGBIJJAH"><Title>Product Data Management</Title>
<BlockQuote><Para><Emphasis>I don't understand</Emphasis>, came
Briareus's code on the common band. <Emphasis>It opened to nowhere</Emphasis>.</Para>
<Para><Emphasis>Not to nowhere</Emphasis>, sent Nemes, reeling in
the filament. <Emphasis>Just to nowhere in the old Web. Nowhere
the Core has built a farcaster.</Emphasis></Para>
<Para><Emphasis>That's impossible</Emphasis>, sent Scylla. <Emphasis>There
are no farcasters except for those the Core has built.</Emphasis></Para>
<Para>Nemes sighed. Her siblings were idiots. <Emphasis>Shut up
and return to the dropship</Emphasis>, she sent. <Emphasis>We have
to report this in person. Councillor Albedo will want to download
personally.</Emphasis></Para>
<Para><Author><Firstname>Dan	</Firstname><Surname>Simmons</Surname></Author><CiteTitle xlink:type="simple" xlink:href="http://www.amazon.com/exec/obidos/ASIN/0553572989/o/qid=959804935/sr=8-1/ref=aps_sr_b_1_1/103-0724774-8699018">The
Rise of Endymion</CiteTitle></Para></BlockQuote>
<Para>Product Data Management (<Acronym>PDM</Acronym>) is a way
to manage people and other resources and product development processes.
It is crucial to most companies and organizations. As such it is
no wonder there are several opinions about it and lots of material
about it. Good overall guides to the subject are, for example <XRef
    Linkend = "BABIGIDJ" xlink:type="simple"
    xlink:href="#BABIGIDJ"  >[CIM98]</XRef> and <XRef
    Linkend = "BABDBDBA" xlink:type="simple"
    xlink:href="#BABDBDBA"  >[PDM97a]</XRef>.
These works were used as source material for this chapter. <Acronym>PDM</Acronym> has
also been the target of extensive standardization work. The most
well-known standard for product data is, without question, the Product
Data Representation and Exchange (<Acronym>STEP</Acronym>) <XRef
    Linkend = "BABDDCCJ" xlink:type="simple"
    xlink:href="#BABDDCCJ"  >[ISO94]</XRef>.</Para>
<Para>The practical part of this thesis evolved because of a need
for better management of documents. To understand document management
some knowledge of the more general problem of Product Data Management
is needed. This chapter will give a brief introduction to these
concepts or the relevant concepts in this area. In addition, product
document management is introduced and the justification for its
importance is pointed out.</Para>
<Sect1 Id="wipdm1"><Title>What Is Product Data Management?</Title>
<Para>Product Data Management can be thought of as the umbrella
word covering (among other things) Engineering Data Management (<Acronym>EDM</Acronym>),
document management, Product Information Management (<Acronym>PIM</Acronym>)
and technical data management. It is being used anywhere there is
some kind of a product being manufactured, sold or maintained. The
word product should be understood very widely. A product can be
anything from an airplane to a computer program or a service. <Quote>The Challenge
is to maximize the time-to-market benefits of concurrent engineering
while maintaining control of your data and distributing it automatically
to the people who need it - when they need it</Quote> <XRef
    Linkend = "BABDBDBA" xlink:type="simple"
    xlink:href="#BABDBDBA"  >[PDM97a]</XRef>.</Para>
<Para>PDM evolved from systems built in-house into commercial systems
in the 1980s. The vendors in those days were already involved with
CAD or other computer aided engineering systems. Initially the systems
were mostly concerned with just engineering data, but recently support
for the whole product life cycle has improved as well <XRef
    Linkend = "BABBEGEE" xlink:type="simple"
    xlink:href="#BABBEGEE"  >[PDM97b]</XRef>.</Para>
<Para>A <Acronym>PDM</Acronym> system consists of a data repository
or vault (often a relational database), a set of user functions
and a set of utility functions. <Acronym>A PDM</Acronym> system's
commands are either embedded into other applications (<Acronym>CAD</Acronym>,
word processors and so on) or the commands from those other systems are
embedded into the <Acronym>PDM</Acronym> system.</Para>
<Para>The <Acronym>PDM</Acronym> system stores two kinds of data:
product data and <FirstTerm>metadata</FirstTerm>. Metadata is data
about data, and in the case of a <Acronym>PDM</Acronym> system it
helps the <Acronym>PDM</Acronym> system carry out the tasks it is
supposed to do. The distinction between data and metadata is a bit
vague. One application's data can be metadata for another application.</Para>
<Para>The user functions in a <Acronym>PDM</Acronym> system can
be divided into five categories: document management, workflow and
process management, product structure management, classification
and, finally, program management. The <Acronym>PDM</Acronym> system
must securely and effectively manage and protect the data from both
accidental or deliberate attempts to destroy data integrity. Logged
access control keeps track of who has got what information and when
they have obtained it. Access control can also prevent unauthorized
access to data. Release management makes sure the data goes through the
required steps before ending up at the customer (for an example,
see <XRef Linkend = "BGBCJBHI" xlink:type="simple"
    href = "#BGBCJBHI" xlink:show = "replace" >Figure
1</XRef>). Document management (or version control) can take care
of logging and managing ad hoc demands to data.</Para>
<Para>Workflow and process management can make the system proactive.
Predefined paths for data can be set up for repetitive tasks, which
not only ensures the correctness of the overall task, but also improves
efficiency. This is very closely related to the conveyor belts in
a factory. Product structure management covers bills of materials
as well as product configurations. Bills of materials can even be
created automatically from product structures. As all information
is gathered in a central place, it is easy to see what things will
be affected by changes. Finding the information in the first place
is easy as well. The <Acronym>PDM</Acronym> system can show different
views of the data, such as structural relationships, documentation
and support information. Classification allows grouping of similar parts
and information, which in turn can cut the time spent in re-design.
Finally, program management completes the user functions. Program
management tools provide work breakdown structures to make it easier
to arrange resources, predict schedules and, in general, track how
projects are doing.</Para>
<Para>Customization is needed to better fit an off-the-self <Acronym>PDM</Acronym> system
to an organization's needs. The utility functions offer a way
to do this without reprogramming the whole <Acronym>PDM</Acronym> system.
The utility functions are also needed to support the user functions
described above. Typical utility functions include communication
and notification services, data transport, data translation and
system administration as well as customized reports for different
user groups and purposes.</Para>
<Para>Communication is improved simply by using a PDM system because
everyone using the system has access to all the information in the
system (access control restrictions apply, of course) in real time.
The <Acronym>PDM</Acronym> system can automatically send notification
emails about unexpected delays or otherwise important happenings.
The system can automatically transfer data from its storage to the
user, so the user need not know where it is actually located. Even
in a moderately large organization not everyone can work with the
same exact word processors and drawing programs so automatic data
translation or transformation can be set up in the <Acronym>PDM</Acronym> system.
Administration functionality offers utilities to change access control,
workflows and data-backup.</Para>
<Para>As can be seen from the wide area of functionality covered,
a <Acronym>PDM</Acronym> system must be designed in such a way that
third-party components can be readily integrated into the main system.
The default user interface and terminology used in the <Acronym>PDM</Acronym> system
may also be customizable for the specific or unique needs of each
organization using the system.</Para>
<Para>That pretty much covers the basics of product data management.
As this thesis is mostly concerned with documents and documentation,
lets take a more thorough look at documentation and document management.</Para></Sect1>
<Sect1 Id="tinpwd2"><Title>There Is No Product without Documentation</Title>
<Para>Simply put, there are no products that do not have documentation
associated with them.</Para>
<Para>Let us think of a nail: A straight thin piece of iron or some
other material, sharp at one end and flattened at other. One would
imagine a product cannot be much simpler than that. But we shall take
a closer look.</Para>
<Para>Let us say we would like to make a new nail. First we must
determine where it should be used. This requires some thought and
maybe research, which will produce our first documents. Suppose there
was an area that would benefit of a new type of a nail. Then we
must inspect the requirements, which will require additional documents.
These documents could say that the nail must be at least three inches
in length but no more than four inches, it must be able to withstand
corrosive substances and it must be very thin. We would then manufacture
some prototypes that would go into testing. We would get test reports.
By the time the nail is perfect in its design we would have a big
pile of paper (or at least lots of files on a computer). The nail
would then be manufactured in large quantities, the factory might
need to be customized for it, some subcontractors would probably
be required to deliver the alloys and so on. By the time the nail
reaches retail shops there would be a mountain of information about
it. And it would still go on: customers would give feedback, there
would be sales figures and so on. Today one just cannot live without
documents.</Para>
<Para>Given this simple example with the nail it is remarkable that
product documentation is usually in very bad shape. In many cases
documentation is seen as the least important piece of the whole product.
The process to create a user manual for a product often does not
start until the product is ready to be shipped to customers. This
may lead to delays in the product release. A very bad situation
indeed. A nail would not have too big a user manual, but a cellular
phone or diving gear would be almost useless without some kind of
instructions.</Para>
<Para>A document is as important part of a product as any other
tangible thing. Maybe even more so. In this information society
knowledge is regarded as the most precious commodity. In some cases the
document itself is the product.</Para>
<Para>The same rules that govern the management of other products
apply to document management as well. Typical life of a document
is presented in <XRef Linkend = "BGBCJBHI" xlink:type="simple"
    href = "#BGBCJBHI" xlink:show = "replace" >Figure
1</XRef>. The planning for a new document usually begins when a
new product is being planned, or a need for a certain kind of a
document is discovered. New legislation or company policies often
result in new documents. Authors or technical writers create the
actual document content with the help of other documents or experts.
The authoring work is often easy to outsource.</Para>
<Para>The document then goes through the checking phase where facts,
spelling, readability and other things are checked. It can be returned
back to the author (with more documentation explaining what was
wrong) or it can advance through the approval process. The persons
that approve documents often take the legal responsibility that
the document is correct. If the document is a user manual for a
machine that can potentially be dangerous to humans this is a big
responsibility indeed. Once the approval stamp is on a document
it is considered "frozen" and it is released, possibly with
other products. User comments and new versions of products can trigger
changes to the documentation which end up in new "frozen" revisions
of the original document.</Para>
<Para>Check-in, check-out and locking are important concepts in
document authoring. A document database handles these tasks and
the database holds the single up-to-date copy of any given document.
When a user wants to edit a document, he must first check out a
work copy from the document database. The database marks the document
as locked and will only allow reading of the locked document. Once
the user has finished editing, he must check the document back in.
This clears the lock. Often, only the differences between document
versions are stored to save disk space. Some systems have more advanced
locking models; for example a document can have multiple locks,
some locks can deny reads and so on.</Para>
<Para>Documents can be prepared as if they were being built on an
assembly line. The planning phase may create some skeleton documents
and additional guidelines for the document. Multiple authors can
add to and refine the skeleton document one after the other until
it is finished. An automated workflow system can improve productivity
immensely, because moving a "job" from one person to the next
is instantaneous. Because the workflow has been etched into the
computer system the "job" will always move to the person it
is supposed to go to and not to somebody else. In some large corporations
files can sit for days waiting to be moved from one person to another.</Para>
<Figure Float = "0"><html:img src="Gradu-1.gif"/>
<Title Id = "BGBCJBHI">Document Life Cycle</Title></Figure>
<Para>If a document database is being used, documents may also be
edited simultaneously by multiple authors. Document databases that
are specialized for structured documents can easily lock parts of documents
while allowing edits in other parts. Normal version control systems
can handle simultaneous edits if the documents are not in a binary
format.</Para>
<Para>Let us consider an example where two authors are editing the
same document at the same time. The first author to check his changes
in does not notice anything unusual, but the second author will
notice that the version control system reports that his version
does not match what is in the version control system and merging
is required. In some cases the system can do an automatic merge
while in other cases manual work is needed. The situation with two
authors editing the same document with the help of a version control
system is presented in <XRef Linkend = "BGBIEIGH" xlink:type="simple"
    href = "#BGBIEIGH" xlink:show = "replace" >Figure
2</XRef>. The boxes labeled <Literal MoreInfo = "None">Doc &lt;number&gt;</Literal> show
the document stored in the version control system, and the number
refers to the version number in the version control system.</Para>
<Figure Float = "0"><html:img src="Gradu-2.gif"/>
<Title Id = "BGBIEIGH">Check-out, Check-in and Merge with Version
Control System</Title></Figure>
<Para>Next we will have a look of the economic issues concerning
documents and documentation.</Para></Sect1>
<Sect1 Id="ei3"><Title>Economic Issues</Title>
<Para>The amount of time and money spent on producing new information
is staggering. For example, 20% of the <Acronym>GNP</Acronym> of
the United States is spent on generating new information. Over 90%
of the information is in documents <XRef Linkend = "Bib-arborwp.xref"
    xlink:type="simple" xlink:href="#Bib-arborwp.xref"
     >[Arb95]</XRef>. The amount of
electronic documentation is growing 20-60% each year while the same
figure is 10% for paper documentation (in the USA) <XRef
    Linkend = "BABFGIFE" xlink:type="simple"
    xlink:href="#BABFGIFE"  >[Onn99]</XRef>.</Para>
<Para>Here is a specific example having to do with oil rigs. It
is estimated that about half of the manufacturing costs is in documentation <XRef
    Linkend = "Bib-ryt97.xref" xlink:type="simple"
    xlink:href="#Bib-ryt97.xref" 
    >[Pel97b]</XRef><Footnote Id = "Oil.fref"><Para>This
was also mentioned in the presentation "Toward STEP Interchange:
Seeing the Document as a Snapshot of the Data" given by Daniel
Rivers-Moore at the 4th International HyTime Conference in Montreal, Canada.</Para></Footnote>.
A recent newspaper mentioned that the cost of setting up a new oil
rig in the Norwegian waters would cost over 11 billion Finnish marks<Footnote>
<Para>This was mentioned in an article in the Finnish "Kauppalehti"
during winter 1999. 11 billion Finnish marks is roughly 2 billion
US dollars.</Para></Footnote>. Even a moderate 10% saving in documentation
would therefore save nearly 600 million Finnish marks from the total
costs of such an oil rig.</Para>
<Para>The World Wide Web is also growing rapidly. The amount of
text in English grows about 50% per year. The figure for non-English
pages is 90%. This will also lead to an increasing need for translations <XRef
    Linkend = "BABJHHIG" xlink:type="simple"
    xlink:href="#BABJHHIG"  >[Kla98]</XRef>.</Para>
<Para>It has been estimated that authors spend up to 30% of their
time searching for information and roughly the same amount of time
laying out the text to produce nice printouts <XRef
    Linkend = "Bib-arborwp.xref" xlink:type="simple"
    xlink:href="#Bib-arborwp.xref" 
    >[Arb95]</XRef>. <Acronym>SGML</Acronym> addresses
both aforementioned areas of authoring. First, because of <Acronym>SGML</Acronym>'s
structured nature, documents conforming to this standard can be
archived and searched much more efficiently than, say, Microsoft
Word documents. Secondly, authors do not have to spend their time
worrying about the layout. This is often unproductive use of time
as the publisher will want to change the layout to conform to their
standards. Laying out and publishing an <Acronym>SGML</Acronym> document
can be completely separated from the authoring process.</Para>
<Para>Technical manuals are often huge, while usually only a small
part of the whole document is needed. The nature of <Acronym>SGML</Acronym> makes
it easy to extract tailored subdocuments from large documents. <Acronym>SGML</Acronym> databases
make it possible to have multiple authors working on the same document at
the same time, because each author needs to lock only the small
piece he is currently working on.</Para>
<Para>Other important areas not yet mentioned include document interchange
and long term storage. While standard word processors emerge with
a new proprietary save format for every major update, <Acronym>SGML</Acronym> has
been around since 1986, and is likely to remain usable for decades
- that is why the term "Everlasting Information" is sometimes
used with documents conforming to the <Acronym>SGML</Acronym> standard.
Without long lived, standard document formats our electronic era
can leave a black hole in information to future historians because
it may simply be impossible to find both the software and hardware
to read some exotic file format in the future. Already it is nearly
impossible to read the data from early magnetic tapes and punched
cards! With <Acronym>SGML</Acronym> only <Acronym>DTD</Acronym>s evolve,
and changing document instances so that they conform to newer versions
of <Acronym>DTD</Acronym>s is generally a lot easier than changing
some word processor's format. Different versions of <Acronym>DTD</Acronym>s
can also coexist, so conversion in many cases is not even needed.
In addition, <Acronym>SGML</Acronym> is platform independent i.e.
as long as <Acronym>ASCII</Acronym> files can be transferred from
one system to another can <Acronym>SGML</Acronym> files be interchanged.
Archival issues have been explored, for example, in <XRef
    Linkend = "BABFGIFE" xlink:type="simple"
    xlink:href="#BABFGIFE"  >[Onn99]</XRef> and <XRef
    Linkend = "BABIAGFB" xlink:type="simple"
    xlink:href="#BABIAGFB"  >[Met99]</XRef>.</Para>
<Para>This section has developed the claim that <Acronym>SGML</Acronym> can
make documents and the processes needed in documentation more effective.
The following chapters will show how and why is this possible.</Para></Sect1></Chapter>
<Chapter Id = "CIHCCIJE"><Title>Structured Documents</Title>
<BlockQuote><Para>HyTime is the borg standard. </Para>
<Para><Author><Firstname>W. </Firstname><OtherName>Eliot </OtherName>
<Surname>Kimber</Surname></Author></Para></BlockQuote>
<Para>All the information we have gathered into something that could
be considered to be a document has structure. Sometimes the structure
is very explicit, like in a computer program, so that even computers
can understand them. Other documents have only implicit structure,
and we may not even know or recognize that something has structure.
For a human being a computer program represented as 0's and 1's
would be meaningless, and devoid of structure. On the other hand,
most computer programs would not make anything out of this paragraph.
The information in this chapter was collected mostly from <XRef
    Linkend = "Bib-tra.xref" xlink:type="simple"
    xlink:href="#Bib-tra.xref" 
    >[Tra95]</XRef>, <XRef Linkend = "BABJBCHE"
    xlink:type="simple" xlink:href="#BABJBCHE"
     >[Gol90]</XRef>, <XRef
    Linkend = "BGBBFHGF" xlink:type="simple"
    xlink:href="#BGBBFHGF"  >[W3C98]</XRef>, <XRef
    Linkend = "BABJBFBG" xlink:type="simple"
    xlink:href="#BABJBFBG"  >[DeR94]</XRef> and <XRef
    Linkend = "BABIEGCG" xlink:type="simple"
    xlink:href="#BABIEGCG"  >[Kim98]</XRef>.</Para>
<Sect1 Id="wtis4"><Title>Ways to Indicate Structure</Title>
<Para>All documents have at least implicit structure. For some documents
the structure has been declared explicitly, either internally or
externally. If the structural information is internally contained
in the document, it is usually called markup.</Para>
<Para>LaTeX <XRef Linkend = "BGBEHIJF" xlink:type="simple"
    xlink:href="#BGBEHIJF"  >[Lam94]</XRef> is
an example of an internal document markup language; where the structural and
layout information is mixed with the content. The beginning of a
section, in the source file, looks something like this:</Para>
<Example><Title>LaTeX Sample</Title>
<ProgramListing Format = "linespecific">\section{Onion Pie} 
\begin{list} 
\item large onions 
\item large tomato ...</ProgramListing></Example>
<Para>The example above would be formatted by a LaTeX system so
that a section heading would appear first, followed by a bulleted
list of ingredients for the onion pie. How the heading and list would
appear can be defined (although this is rather difficult), but the
default heading would look pretty much the same as the section headings
in this thesis. All the markup (like <Literal MoreInfo = "None">\begin{list}</Literal>)
would be processed by the software and it would not end up in the
actual layout.</Para>
<Para>An example of external markup is the graphics format for Regenesis<Footnote
    Id = "Regenesis.ft"><Para>Regenesis may have been the first
online game with graphics. Its "children" are Doom, Quake and
Warbird, for some examples. There is a WWW page describing the <Acronym>MUD</Acronym> at <Literal
    MoreInfo = "None">http://www.lysator.liu.se/mud/bsxmud.html</Literal>.
To actually play the game one needs a client program that contacts
the <Acronym>MUD</Acronym> server. Regenesis was originally written
by Bram Stolk.</Para></Footnote>, the first graphical multi-user
dungeon (<Acronym>MUD</Acronym>). The graphics are simple colored
polygons that consist of a maximum of 32 vertices. There can be
a maximum of 32 polygons in a scene. Regenesis uses 16 colors. The
coordinate system is a 256 by 256 matrix. An image is represented
as a string of hexadecimal numbers. As such it has no apparent structure.
But that string has structure when it is viewed with the external
information: the programs drawing the images must of course know
what to do with the string! The first two characters from the string,
when converted to a hexadecimal number, is the number of polygons
in the string, the next number is the number of vertices in the
first polygon and so on. See <XRef Linkend = "CIHIFDCB"
    xlink:type="simple" href = "#CIHIFDCB" xlink:show = "replace"
    >Figure 3</XRef> for sample.</Para>
<Figure Float = "0"><html:img src="Gradu-3.gif"/>
<Title Id = "CIHIFDCB">The Structure of Scene Markup in Regenesis</Title></Figure>
<Para>There is a hybrid version of the internal/external markup.
Some documents contain the structural markup in the beginning or
in the end of the document, which contains pointers to the actual
content. Microsoft Word uses this technique <XRef
    Linkend = "Bib-tra.xref" xlink:type="simple"
    xlink:href="#Bib-tra.xref" 
    >[Tra95]</XRef>.</Para>
<Para>Structured documents are normally formatted to get usable
view of the data. It would not make much sense to try to read the
binary representation of a computer program, for example. Running the
program presents the structured data in an understandable format.</Para></Sect1>
<Sect1 Id="lap5"><Title>Languages And Parsers</Title>
<Para>Structured documents use some language to describe the structure.
For example, the C++ programming language is standardized and there
is software that can do meaningful things with data in C++ notation.</Para>
<Para>All languages (or more precisely, grammars) can be divided
into several different categories, for example regular and context
free languages. Any book on the theory of computation or compiler design
will discuss the theory of languages, for example <XRef
    Linkend = "BABIAEHI" xlink:type="simple"
    xlink:href="#BABIAEHI"  >[Sip96]</XRef>.
There is no need to understand language theory in depth, for the
purposes of this paper, and this subject will not be explained in detail.
A rudimentary knowledge of basic language theory as taught in elementary
computer science, however, will be assumed.</Para>
<Para>A parser is a software component that can read data conforming
to some grammar and build an in-memory representation of the structure
(or generate events based on the structures found in the language).
A parser that builds an in-memory representation of the data will
typically be slower than an event-based parser. And it cannot handle
as large documents as the other parser type that can discard data
as soon as it has recognized and generated an appropriate event.
It is possible to combine the two parser types, however, so that
generally the stream-based parser is used to scan quickly to an
interesting part after which an in-memory representation of that
part is built.</Para>
<Para>The in-memory representation of data (or events) is easier
to handle programmatically than the raw data, and it makes it possible
to change the document language somewhat without requiring a rewrite
of the components that use the processed structures. The parser
can also detect errors in the document (i.e. if the document does
not conform to the expected grammar), and stop further processing
because the data would appear to be corrupted. Application programmers
do not need to worry about certain kinds of errors because of this
parser feature.</Para>
<Para>Standard Generalized Markup Language documents belong to the
category of internally marked-up structured documents. Some people
have voiced their opinions that internal markup is too limiting
and that SGML should be revised to offer external markup <XRef
    Linkend = "BABBGCJI" xlink:type="simple"
    xlink:href="#BABBGCJI"  >[Nel97]</XRef>.</Para></Sect1>
<Sect1 Id="sgml6"><Title>Standard Generalized Markup Language</Title>
<Para>Standard Generalized Markup Language, or <Acronym>SGML</Acronym> <XRef
    Linkend = "BGBEIDAJ" xlink:type="simple"
    xlink:href="#BGBEIDAJ"  >[ISO86]</XRef> for
short, is an international standard for structured documents. To
be more exact, <Acronym>SGML</Acronym> is a metalanguage which means
that <Acronym>SGML</Acronym> is used to describe other (structural)
languages.</Para>
<Sect2 Id="abhos7"><Title>A Brief History of SGML</Title>
<Para>Before going into the gritty details, let us take a journey
into the history of SGML (the following is collected from <XRef
    Linkend = "BABJBCHE" xlink:type="simple"
    xlink:href="#BABJBCHE"  >[Gol90]</XRef>).
Even before computers existed, manuscripts were annotated with special
comments to describe how the text should appear. These special comments
were called <Literal MoreInfo = "None">markup</Literal>. Electronic
manuscripts also contained these control codes or macros. These
codes could be said to be <Emphasis>specific coding</Emphasis>. <Emphasis>Generic
coding</Emphasis> began in the late 1960s, the most visible change being
that macros and codes got names like <Literal MoreInfo = "None">heading</Literal> instead
of some obscure label or directive like <Literal MoreInfo = "None">format-13A3</Literal>. </Para>
<Para>The credit for this change is often given to William Tunnicliffe,
who gave a speech in 1967 on the topic of separating the information
content of documents from the formatting rules. At about the same
time a book designer named Stanley Rice presented his idea of a <Quote>universal
catalog of parametrized <Emphasis>editorial structure</Emphasis> tags</Quote>.
Norman Scharpf (director of the Graphic Communications Association)
realized the importance of these developments and began promoting
the creation of standards in this area.</Para>
<Para>Charles Goldfarb, together with Edward Mosher and Raymond
Lorie, was working on an <Acronym>IBM</Acronym> research which claimed
to enable text editing, formatting and information retrieval subsystems
to work together and to share documents. The result of their work
was Generalized Markup Language (<Acronym>GML</Acronym>). <Acronym>GML</Acronym> was
based on the ideas of Tunnicliffe and Rice, but it went further,
introducing formal document types and nested element structure.</Para>
<Para>In 1978 Charles Goldfarb, who had continued his research even
after <Acronym>GML</Acronym> was finished, joined the Computer Languages
for the Processing of Text committee under the American National
Standards Institute (<Acronym>ANSI</Acronym>). Eventually he was
leading the development of the <Acronym>SGML</Acronym> standard.
The first working draft was published in 1980, and after several
more drafts and recommendations for industry standards, <Acronym>SGML</Acronym> was
finally published as an International Standards of Organization standard
ISO 8897:1986.</Para></Sect2>
<Sect2 Id="sinl8"><Title>Structure Is Not Layout</Title>
<Para>It has been said that all documents have structure, but let's
take a memorandum for an example. A memorandum may have a title,
date, author and the actual content of the memorandum. Structure can
be considered to be a part of the metadata of a document. Metadata
is simply information about information, <Abbrev>i.e</Abbrev> what
it deals with, how the information is stored, in what order certain items
appear and so forth. Metadata is <Emphasis Role = "B">not</Emphasis> the
actual <Emphasis Role = "B">content</Emphasis> of the document.
The structural information in an SGML document instance is called <Literal
    Role = "B" MoreInfo = "None">markup</Literal>.</Para>
<Para>Structure should <Emphasis>not</Emphasis> be confused with
the <Emphasis Role = "B">layout</Emphasis> of a document, although
the structure of documents is usually emphasized with special layout.
For example, titles in this document are printed with a larger font
than the rest of the text.</Para></Sect2>
<Sect2 Id = "BABIDIHJ"><Title>SGML in a Nutshell</Title>
<Para>SGML is used to create vocabularies for real document languages.
The vocabulary specifies what names can be used in the language.
Similarly, the grammar specifies in what order the elements can
appear and if they can repeat and so forth. The defined names can
also have meta-information associated with them. Usually, the names
are container objects that can contain other containers and plain
text. The languages defined by SGML are typically infinite in the
sense that one cannot write out all instances of documents conforming
to a given language. The languages defined by SGML are not regular,
although the content model of an element is regular. One cannot
use SGML to define context-free grammars <XRef Linkend = "BABEHGFG"
    xlink:type="simple" xlink:href="#BABEHGFG"
     >[Pre98]</XRef>.</Para>
<Para><Acronym>SGML</Acronym> stores structural metadata information
about classes of documents in Document Type Definitions (<Acronym>DTD</Acronym>).
An actual document that conforms to a certain <Acronym>DTD</Acronym> is
called an <Acronym>SGML</Acronym> document instance. In short, all
real <Acronym>SGML</Acronym> documents are instances of <Acronym>SGML</Acronym> documents
conforming to certain <Acronym>DTD</Acronym>s.</Para>
<Para><Acronym>SGML</Acronym> documents have three parts in them:
the <FirstTerm>SGML declaration</FirstTerm>, the <FirstTerm>document
type definition</FirstTerm> and the <FirstTerm>SGML instance</FirstTerm> conforming
to the document type definition (see <XRef Linkend = "CIHCGBFI"
    xlink:type="simple" href = "#CIHCGBFI" xlink:show = "replace"
    >Figure 4</XRef>). <XRef
    Linkend = "Sgml-doc-comp.xref" xlink:type="simple"
    href = "#Sgml-doc-comp.xref" xlink:show = "replace" >Example 4</XRef> shows
the document in text form (the SGML declaration is omitted for brevity).
The three parts can be in a single file or in separate files.</Para>
<Figure Float = "0"><html:img src="Gradu-4.gif"/>
<Title Id = "CIHCGBFI">SGML Document Diagram</Title></Figure>
<Para>If the declaration or <Acronym>DTD</Acronym> is to be reused
in other document instances, it is of course advisable to separate
them. This poses some problems, because <Acronym>SGML</Acronym> itself
does not specify how the processing application can find the different
parts. One widely accepted method is to refer to the Document Type
Definition in the document instance with a public identifier. Public
identifiers are mapped to actual file names and locations thruogh
a <FirstTerm>catalog</FirstTerm>, typically a simple text file. <XRef
    Linkend = "CIHCJHDH" xlink:type="simple" href = "#CIHCJHDH"
    xlink:show = "replace" >Example 3</XRef> shows the
contents of a sample catalog file. This ad hoc standard has been
proposed by the <Acronym>OASIS</Acronym><Footnote><Para>OASIS was
formerly known as SGML-Open.</Para></Footnote> vendor consortium. </Para>
<Para>The catalog can be used to locate the <Acronym>SGML</Acronym> declaration
as well. However, the <Acronym>SGML</Acronym> standard defines a <FirstTerm>reference
concrete syntax</FirstTerm>, a declaration that is assumed if no
declaration is given. The <Acronym>SGML</Acronym> declaration defines
- among other things - what characters are used to distinguish
the <Acronym>SGML</Acronym> markup from the actual text.</Para>
<Example><Title Id = "CIHCJHDH">Contents of a Sample CATALOG File</Title>
<LiteralLayout xml:space = "preserve" Format = "linespecific">
PUBLIC "-//Heikki Toivonen//DTD Memo//EN" "memo.dtd"
</LiteralLayout></Example>
<Para>Usually the reference concrete syntax is enough for most documents,
although it has some rather frustrating limitations like name length
limited to eight characters. Nowadays most software packages use
the reference concrete syntax as a base, but extend it to enable
longer names and remove some historical remnants that were required
in the early days of <Acronym>SGML</Acronym> when computers where
not as powerful as they are today.</Para>
<Para>The SGML declaration is not very interesting for the normal
user of <Acronym>SGML</Acronym>. Instead, the <Acronym>DTD</Acronym> and document
instances are more important. The ability to read and understand <Acronym>DTD</Acronym>s
and document instances is sufficient for writers <XRef
    Linkend = "Bib-tur96.xref" xlink:type="simple"
    xlink:href="#Bib-tur96.xref" 
    >[Tur96]</XRef>.</Para></Sect2>
<Sect2 Id = "CIHGBIHD"><Title>DTD And Document Instance</Title>
<Para>Refer to <XRef Linkend = "Sgml-doc-comp.xref" xlink:type="simple"
    href = "#Sgml-doc-comp.xref" xlink:show = "replace" >Example
4</XRef> for clarification of the following definitions.</Para>
<Para>The reference concrete syntax specifies that declarations
in a DTD start with <Literal MoreInfo = "None">&lt;!</Literal> and
end with <Literal MoreInfo = "None">&gt;</Literal>. After the <Literal
    MoreInfo = "None">&lt;!</Literal> comes a keyword specifying
what sort of thing is being declared. Usually the keyword is <Literal
    MoreInfo = "None">ELEMENT</Literal>, <Literal MoreInfo = "None">ATTLIST</Literal> or <Literal
    MoreInfo = "None">ENTITY</Literal>. <Literal MoreInfo = "None">ELEMENT</Literal> declares
an element name that can be used in the <Acronym>SGML</Acronym> instance, <Literal
    MoreInfo = "None">ATTLIST</Literal> defines attributes for an
element and <Literal MoreInfo = "None">ENTITY</Literal> can define
different sorts of entities (like references to external images
or video sequences).</Para>
<Para>In an element declaration, after the element name, it is possible
to specify if the start and/or end tag of the element can be omitted.
An end tag can be omitted when the parser can detect from the next
element that the current element must be closed. No look ahead is
needed or even allowed in the parser. The tag omit rules are specified
with a minus <Literal MoreInfo = "None">(-)</Literal> and letter <Literal
    MoreInfo = "None">o</Literal>, <Literal MoreInfo = "None">o</Literal> meaning
omissible.</Para>
<Para>The second to the last part of the element declaration is
the content model. The content model can contain other element names
and/or special keywords. Element names are separated with a comma <Literal
    MoreInfo = "None">(,)</Literal> if the elements must appear
one after the other, bar <Literal MoreInfo = "None">(|)</Literal> if
only one is allowed and ampersand (<Literal MoreInfo = "None">&amp;)</Literal> if
the elements can appear in any order. It is possible to specify
that an element is optional by putting a question mark (<Literal
    MoreInfo = "None">?)</Literal> after the element's name. It
is also possible to say that an element must occur one or more times
with a plus (<Literal MoreInfo = "None">+</Literal>), or zero or
more times with an asterisk (<Literal MoreInfo = "None">*</Literal>). Special
keywords common in the content model parts are <Literal
    MoreInfo = "None">#PCDATA</Literal> and <Literal MoreInfo = "None">EMPTY</Literal>. <Literal
    MoreInfo = "None">#PCDATA</Literal> means parseable character
data, i.e. normal text. An <Literal MoreInfo = "None">EMPTY</Literal> content
model cannot be combined with any other content model element names
or keywords as it means that the element does not have any content.
In that case information about the element is only in its attributes.
Parenthesis can be used to group parts of the content model.</Para>
<Para>The last part of the element declaration can be used to specify
inclusion or exclusion exceptions. Inclusion exceptions are elements
that can appear anywhere inside the element being defined, including
its children. Exclusion exceptions disallow the exclusion element
from anywhere in the defined element, including its children. Exclusion
has higher precedence that inclusion. Typical inclusion exceptions
are page break and cross-reference elements that can often appear
anywhere. A typical exclusion exception would deny page break elements
inside title elements.</Para>
<Para>An element can have multiple attributes. Attributes have a
name, a data type and a keyword specifying if the attribute is required
(<Literal MoreInfo = "None">#REQUIRED</Literal>) or optional (<Literal
    MoreInfo = "None">#IMPLIED</Literal>). An attribute can also have
a default value. Common data types are <Literal MoreInfo = "None">CDATA</Literal> (for
normal text) and <Literal MoreInfo = "None">NUMBER</Literal> (obviously
a number) as well as <Literal MoreInfo = "None">ID</Literal> and <Literal
    MoreInfo = "None">IDREF</Literal> for cross-referencing. Data
types can also indicate multiple entries, for example <Literal
    MoreInfo = "None">NUMBERS</Literal> and <Literal MoreInfo = "None">IDREFS</Literal>.
Multiple entries are separated with spaces. If an attribute value
has spaces or other separator characters in it (as specified in
the SGML declaration), the value(s) must be quoted. The reference
concrete syntax specifies two legal quote characters, both quote
(<Literal MoreInfo = "None">"</Literal>) and apostrophe (<Literal
    MoreInfo = "None">'</Literal>).</Para>
<Para>Usually entities declared in the <Acronym>DTD</Acronym> are
parameter entities, which are used to make the <Acronym>DTD</Acronym> easier
to read and manage. Entities declared in a <FirstTerm>document declaration
subset</FirstTerm> are normally used to refer to external SGML document
fragments and images. The document declaration subset is enclosed
within square brackets (<Literal MoreInfo = "None">[</Literal> and <Literal
    MoreInfo = "None">]</Literal>) in the beginning of the document
instance. Comments appear between <Literal MoreInfo = "None">&lt;!--</Literal> and <Literal
    MoreInfo = "None">--&gt;</Literal>.</Para>
<Para>The <Acronym>SGML</Acronym> document instance contains the
document's data. The data is mixed with the <Acronym>SGML</Acronym> markup.
The reference concrete syntax specifies that start tags start with <Literal
    MoreInfo = "None">&lt;</Literal> and end with <Literal
    MoreInfo = "None">&gt;</Literal>. End tags start with <Literal
    MoreInfo = "None">&lt;/</Literal>. Attributes are specified
in the start tag between the element name and the closing <Literal
    MoreInfo = "None">&gt;</Literal>. To put it simply, an attribute
is a name-value pair. The name and value are separated by <Literal
    MoreInfo = "None">=</Literal>. <Acronym>SGML</Acronym> also
allows only the attribute value to appear if there can be no ambiguity
as to which attribute is intended.</Para>
<Para><XRef Linkend = "Sgml-doc-comp.xref" xlink:type="simple"
    href = "#Sgml-doc-comp.xref" xlink:show = "replace" >Example
4</XRef> shows an SGML document with its DTD (the SGML declaration
has been omitted for brevity).</Para>
<Para>The <Acronym>DTD</Acronym> starts with the <Literal
    MoreInfo = "None">DOCTYPE</Literal> keyword, followed by a public
identifier (the identifier could be omitted, or there could be a
system identifier or both a public and a system identifier). The public
identifier can be used in other documents to refer to this <Acronym>DTD</Acronym>.
In the <Acronym>DTD</Acronym> there is first a definition for a
notation called <Literal MoreInfo = "None">GIF</Literal>, with a
system identifier for the notation. The notation is later used in
the definition that defines an image entity, where the image type
is <Literal MoreInfo = "None">GIF</Literal>.</Para>
<Para>The element definitions start with <Literal MoreInfo = "None">memo</Literal> (the
order of definitions is not important, by the way). It has a required
start tag and an optional end tag. The element <Literal
    MoreInfo = "None">memo</Literal> can contain an optional <Literal
    MoreInfo = "None">title</Literal>, followed by one or more <Literal
    MoreInfo = "None">para</Literal>s and zero or more <Literal
    MoreInfo = "None">image</Literal>s. Element <Literal
    MoreInfo = "None">memo</Literal> has three attributes: an optional <Literal
    MoreInfo = "None">language</Literal>, a required <Literal
    MoreInfo = "None">id</Literal> and <Literal MoreInfo = "None">secret</Literal>,
which can have two possible values with <Literal MoreInfo = "None">open</Literal> being
the default value. Other elements are defined similarly. The <Literal
    MoreInfo = "None">image</Literal> element deserves special mention.
It has a content model of <Literal MoreInfo = "None">EMPTY</Literal>,
meaning it cannot contain any other elements or text. The information
for that element is in its attributes. In this case, the element
has one attribute <Literal MoreInfo = "None">pic</Literal>, of type <Literal
    MoreInfo = "None">ENTITY</Literal>, and the attribute must always
be specified in the document instance.</Para>
<Para>The document instance begins with the <Literal MoreInfo = "None">memo</Literal> start
tag. The instance shows all the required start and end tags of the
elements. <Literal MoreInfo = "None">para</Literal> and <Literal
    MoreInfo = "None">image</Literal> do not have their end tags,
which is legal because the <Acronym>DTD</Acronym> allows this. All
required attributes are also specified. The <Literal MoreInfo = "None">pic</Literal> attribute
of the <Literal MoreInfo = "None">image</Literal> element refers
to the <Literal MoreInfo = "None">tooth</Literal> entity specified
in the <Acronym>DTD</Acronym>, and this should show up as an image
of a tooth in an <Acronym>SGML</Acronym> browser.</Para>
<Example Id = "Sgml-doc-comp.xref"><Title>SGML Document Structure
As Text</Title>
<Para></Para>
<ProgramListing Format = "linespecific">&lt;!SGML ... &gt;

&lt;!DOCTYPE memo 	PUBLIC &quot;-//Heikki Toivonen//DTD Memo//EN&quot;
[ </ProgramListing>
<ProgramListing Format = "linespecific"></ProgramListing>
<ProgramListing Format = "linespecific">&lt;!NOTATION GIF SYSTEM&gt;
&lt;!ENTITY tooth SYSTEM &quot;tooth.gif&quot; NDATA GIF &gt; 

&lt;!ELEMENT memo - o (title? , para+, image*) &gt; 
&lt;!ATTLIST memo 
	language CDATA #IMPLIED 
	id ID #REQUIRED 
	secret (open | internal) &quot;open&quot; &gt; 
</ProgramListing>
<ProgramListing Format = "linespecific">&lt;!ELEMENT title - o (#PCDATA)&gt;</ProgramListing>
<ProgramListing Format = "linespecific"></ProgramListing>
<ProgramListing Format = "linespecific">&lt;!ELEMENT para - o (#PCDATA)&gt;</ProgramListing>
<ProgramListing Format = "linespecific"></ProgramListing>
<ProgramListing Format = "linespecific">&lt;!ELEMEMT image - o EMPTY&gt;</ProgramListing>
<ProgramListing Format = "linespecific">&lt;!ATTLIST image</ProgramListing>
<ProgramListing Format = "linespecific">pic ENTITY #REQUIRED&gt;</ProgramListing>
<ProgramListing Format = "linespecific">
]&gt; </ProgramListing>
<ProgramListing Format = "linespecific">
&lt;memo id=&quot;unique-id-1&quot; language=&quot;English&quot;&gt; 
&lt;title&gt;Remember Dentist!&lt;/title&gt;</ProgramListing>
<ProgramListing Format = "linespecific">&lt;para&gt;Dentist tomorrow
at one o'clock.</ProgramListing>
<ProgramListing Format = "linespecific">&lt;image pic='tooth'&gt;
&lt;/memo&gt;</ProgramListing></Example></Sect2>
<Sect2 Id="eeaswtramtf10"><Title>External Entities - A Simple Way to Reuse And Manage
Text Fragments</Title>
<Para>External entities are such a critical part of this thesis
that it is important to get the basics right. An external entity
declaration contains the entity name followed by a public identifier
or a system identifier or both. Additionally, a notation type can
be specified. If there is no system identifier, the processing application
must somehow get the system identifier and acquire the contents
of the entity. The <Acronym>OASIS</Acronym> catalog is a good way
to map public identifiers to system identifiers. It should be noted
that a system identifier is exactly what the name says. It is not
necessarily a file name. It could be a database query or just about
anything. As long as the processing system knows what to do with
it, it can use any means or description desired to cause the contents
of the entity to be made available to the SGML processor.</Para>
<Para>An entity is used in a document by simply inserting the entity's
name, surrounded by an ampersand (<Literal MoreInfo = "None">&amp;</Literal>)
and semi-colon (<Literal MoreInfo = "None">;</Literal>), where it
is wanted. Similarly, special characters that are not present in
the character set used can be inserted into the document. So, for
example, the letter <Literal MoreInfo = "None">ä</Literal> is usually represented
with an "umlaut" entity called <Literal MoreInfo = "None">auml</Literal> as <Literal
    MoreInfo = "None">&amp;auml;</Literal>.</Para>
<Para>It is possible to define a default entity whose contents will
be used for entities that are not defined. The entity text could
say for example <Literal MoreInfo = "None">@@Undefined entity@@</Literal> to
indicate that it should be defined at some point. This is sometimes
used to make documents containing undefined entities valid.</Para>
<Para>External entities allow reuse of parts of documents in other
documents even if the files are saved in a normal file system. For
example, a manual consisting of several chapters could be constructed
so that each chapter would reside in its own file. The whole manual
would need to be assembled from these files. How this might look
like in <Acronym>SGML</Acronym> is shown in <XRef Linkend = "CIHCBCCA"
    xlink:type="simple" href = "#CIHCBCCA" xlink:show = "replace"
    >Example 5</XRef>. Now, if there were several
manuals and the first chapter was an introduction that was identical
in all manuals it would be a simple matter to reuse the first chapter.
Of course this saves space, but the more important bonus is that
by changing the first chapter all documents using it would be immediately updated.
Automatic update may not be wanted in every case but it is impossible
to prevent this without additional tools if reuse is what is wanted.
Unfortunately reality sucks, and not all <Acronym>SGML</Acronym> tools
support this external entity approach very well even though it is
quite simple.</Para>
<Example Id = "CIHCBCCA"><Title>External Entities</Title>
<Para>File <Literal MoreInfo = "None">manual.sgm</Literal>:</Para>
<ProgramListing Format = "linespecific">&lt;!DOCTYPE manual PUBLIC
"-//Heikki Toivonen//DTD Manual//EN" [</ProgramListing>
<ProgramListing Format = "linespecific">&lt;!ENTITY chap1 SYSTEM
"chap1.inc"&gt;</ProgramListing>
<ProgramListing Format = "linespecific">&lt;!ENTITY chap2 SYSTEM
"chap2.inc"&gt;</ProgramListing>
<ProgramListing Format = "linespecific">]&gt;</ProgramListing>
<ProgramListing Format = "linespecific">&lt;manual&gt;&amp;chap1;&amp;chap2;&lt;/manual&gt;</ProgramListing>
<Para>File <Literal MoreInfo = "None">chap1.inc</Literal>:</Para>
<ProgramListing Format = "linespecific">&lt;chapter&gt;&lt;title&gt;Introduction...&lt;/chapter&gt;</ProgramListing>
<Para>File <Literal MoreInfo = "None">chap2.inc</Literal>:</Para>
<ProgramListing Format = "linespecific">&lt;chapter&gt;&lt;title&gt;The
Life of Brian...&lt;/chapter&gt;</ProgramListing></Example>
<Para>The problem with the external chapters in <XRef
    Linkend = "CIHCBCCA" xlink:type="simple" href = "#CIHCBCCA"
    xlink:show = "replace" >Example 5</XRef> is that they
themselves are not valid <Acronym>SGML</Acronym> documents. They
need to be included in a manual that has the <Literal MoreInfo = "None">DOCTYPE</Literal> header
information. There is a rarely used <Acronym>SGML</Acronym> feature
called <Literal MoreInfo = "None">SUBDOC</Literal> that would make
it possible to create the chapters as standalone documents and it
would still be possible to include them in the manual file. Very
few tools support it, though.</Para></Sect2>
<Sect2 Id="uos11"><Title>Users of SGML</Title>
<Para>The value of <Acronym>SGML</Acronym> was first fully realized
in large corporations and organizations, like the U.S. Department
of Defense. Other large companies using <Acronym>SGML</Acronym> now
include Nokia, Novell, Microsoft and Hewlett-Packard.</Para>
<Para>Although the original users were mainly involved with the
military, the commercial wing has caught up. The greatest interest
in <Acronym>SGML</Acronym> seems to be in areas like telecommunications,
aerospace, manufacturing, publishing and pharmaceuticals. This does
not mean that <Acronym>SGML</Acronym> is not suited for other users.
It so happens that the aforementioned areas share the characteristic
that they produce huge amounts of documentation that has to be maintained
and manipulated effectively. However, the amount of documentation
is not the only indication that structured documents might be a
good solution to better manage the information. Frequent updates,
multiple authors, long term storage and stringent validation requirements
are other indicators, to name a few. The Extensible Markup Language
(discussed below) is rapidly bringing smaller players into the picture
as well.</Para>
<Para><Acronym>SGML</Acronym> should generally not be used where
the information is used only once, it easy to reproduce, it is not
important or, in short, the opposite of what was mentioned before.
Typical examples of documents that would not benefit from <Acronym>SGML</Acronym> include
notes and informal letters.</Para>
<Para>It may be interesting to note that HyperText Markup Language
(<Acronym>HTML</Acronym>) is <Acronym>SGML</Acronym>. <Acronym>HTML</Acronym> is
simply an <Acronym>SGML</Acronym> <Acronym>DTD</Acronym>. However, <Acronym>HTML</Acronym> follows
rather loosely the philosophy underpining <Acronym>SGML</Acronym>. For
example, <Acronym>HTML</Acronym> has page formatting elements, which
are completely against the principles of <Acronym>SGML</Acronym>.</Para>
<Para>Regardless of <Acronym>HTML</Acronym>s deviance from <Acronym>SGML</Acronym> principles,
it has done a great job of advertising the benefits of structured
documents, along with World Wide Web (<Acronym>WWW</Acronym>), of
course. The Web itself is becoming more "mature" in a structural
sense. The appearance of Extensible Markup Language enables web
authors to mark up their documents with descriptive markup instead
of the restricted <Acronym>HTML</Acronym> tag set. This will eventually
bring smart search engines to the web that can use the document
structure to achieve better signal to noise ratio <XRef
    Linkend = "BABHAAGE" xlink:type="simple"
    xlink:href="#BABHAAGE"  >[Mya98]</XRef>.</Para></Sect2></Sect1>
<Sect1 Id="eml12"><Title>Extensible Markup Language</Title>
<Para>The Extensible Markup Language (<Acronym>XML</Acronym>) <XRef
    Linkend = "BGBBFHGF" xlink:type="simple"
    xlink:href="#BGBBFHGF"  >[W3C98]</XRef> is
the little brother of <Acronym>SGML</Acronym>. The credit for the
birth of <Acronym>XML</Acronym> is often given to two men in particular:
Jon Bosak and Tim Bray. Their hard work to bring <Acronym>SGML</Acronym> to
the World Wide Web resulted in <Acronym>XML</Acronym>. It has succeeded
in bringing the true power of <Acronym>SGML</Acronym> to the masses.
It is easier to parse and use than <Acronym>SGML</Acronym>, which
makes it easier to write <Acronym>XML</Acronym> applications, not
to say faster and cheaper. All valid <Acronym>XML</Acronym> documents
are valid <Acronym>SGML</Acronym> documents, but not vice versa. </Para>
<Para><Acronym>XML</Acronym> did away with some of the historical
baggage <Acronym>SGML</Acronym> was carrying around to make it easier to
process. For example <Acronym>XML</Acronym> does not allow exceptions
(see <XRef Linkend = "BABIDIHJ" xlink:type="simple"
    href = "#BABIDIHJ" xlink:show = "replace" >Section
3.3.3</XRef>) nor is it possible to omit tags in <Acronym>XML</Acronym>.
Because tags cannot be omitted the <Literal MoreInfo = "None">characters -</Literal> and <Literal
    MoreInfo = "None">o</Literal> just after the element name in
element definition are no longer used in an <Acronym>XML</Acronym> <Acronym>DTD</Acronym>.</Para>
<Para>The only valid character set is Unicode <XRef Linkend = "BABCIFCA"
    xlink:type="simple" xlink:href="#BABCIFCA"
     >[ISO93]</XRef>, while in <Acronym>SGML</Acronym> the
character set can be specified in the SGML declaration. Because
the Unicode code base could potentially include practically all
characters there are or ever will be, no character entities are
used - meaning <Literal MoreInfo = "None">ä</Literal> is encoded
as <Literal MoreInfo = "None">ä</Literal> instead of <Literal
    MoreInfo = "None">&amp;auml;</Literal>. Numerical entities can
still be used (for example for characters that cannot be typed).
Numerical entities are of the form <Literal MoreInfo = "None">&amp;#&lt;number&gt;;</Literal> where <Literal
    MoreInfo = "None">&lt;number&gt;</Literal> is some character
code in Unicode. Also, all names are case sensitive (<Acronym>SGML</Acronym> allows
one to specify if case is relevant). There are plenty of these small
changes, but generally one does not need to worry about them unless
there is a constant need to do transformations between <Acronym>SGML</Acronym> and <Acronym>XML</Acronym>.</Para>
<Para><Acronym>XML</Acronym> introduced the concept of <FirstTerm>well-formedness</FirstTerm>.
All <Acronym>XML</Acronym> documents must be well-formed. A well-formed <Acronym>XML</Acronym> document
has all its open and close tags, elements nest properly (i.e. a
situation like <Literal MoreInfo = "None">&lt;a&gt;&lt;b&gt;&lt;/a&gt;&lt;/b&gt;</Literal> is
prohibited) and it does not need to have a <Acronym>DTD</Acronym>.
A <FirstTerm>valid</FirstTerm> document is naturally well-formed,
but it must also contain a <Acronym>DTD</Acronym> and it must conform
to the <Acronym>DTD</Acronym>. Parsers that check only well-formedness
are pretty easy to write, and they can work a lot faster and with
less resources than validating parsers. See <XRef Linkend = "CIHIHJHE"
    xlink:type="simple" href = "#CIHIHJHE" xlink:show = "replace"
    >Example 6</XRef> and <XRef Linkend = "CIHJGHAI"
    xlink:type="simple" href = "#CIHJGHAI" xlink:show = "replace"
    >Example 7</XRef> for samples of well-formed
and valid <Acronym>XML</Acronym> instances, respectively.</Para>
<Example Id = "CIHIHJHE"><Title>A Well-formed XML Document</Title>
<ProgramListing Format = "linespecific">&lt;?xml version='1.0'?&gt;</ProgramListing>
<ProgramListing Format = "linespecific">&lt;memo&gt;</ProgramListing>
<ProgramListing Format = "linespecific">&lt;title&gt;This is a title&lt;/title&gt;</ProgramListing>
<ProgramListing Format = "linespecific">&lt;para&gt;I have to remember
this&lt;/para&gt;</ProgramListing>
<ProgramListing Format = "linespecific">&lt;/memo&gt;</ProgramListing></Example>
<Example Id = "CIHJGHAI"><Title>A Valid XML Document</Title>
<ProgramListing Format = "linespecific">&lt;?xml version='1.0'?&gt;</ProgramListing>
<ProgramListing Format = "linespecific">&lt;!DOCTYPE memo [</ProgramListing>
<ProgramListing Format = "linespecific">&lt;!ELEMENT memo (title,para+)&gt;</ProgramListing>
<ProgramListing Format = "linespecific">&lt;!ELEMENT title (#PCDATA)&gt;</ProgramListing>
<ProgramListing Format = "linespecific">&lt;!ELEMENT para (#PCDATA)&gt;</ProgramListing>
<ProgramListing Format = "linespecific">]&gt;</ProgramListing>
<ProgramListing Format = "linespecific">&lt;memo&gt;</ProgramListing>
<ProgramListing Format = "linespecific">&lt;title&gt;This is a title&lt;/title&gt;</ProgramListing>
<ProgramListing Format = "linespecific">&lt;para&gt;I have to remember
this&lt;/para&gt;</ProgramListing>
<ProgramListing Format = "linespecific">&lt;/memo&gt;</ProgramListing></Example>
<Para>It is simple to delinate where SGML is and is not an appropriate
technology option but it is less obvious with XML. Because well-formed
documents are easy to process and do not require strict adherence
to any predefined <Acronym>DTD</Acronym> it could be argued that <Acronym>XML</Acronym> could
be used everywhere.</Para>
<Para><Acronym>XML</Acronym> itself is really a clear-cut standard
as standards go (strictly speaking it is not a standard per se,
but a World Wide Web Consortium Recommendation which is effectively
an Internet standard). But <Acronym>XML</Acronym> does not exist
in a vacuum, and there are many related Recommendations in the works
or already finished that are usually needed when working with it.
The Namespaces in XML <XRef Linkend = "BABGEFHE" xlink:type="simple"
    xlink:href="#BABGEFHE"  >[W3C99a]</XRef> Recommendation
is a sort of an effort to simplify the architectural forms concept from
the HyTime standard (see <XRef Linkend = "BABCBFHC" xlink:type="simple"
    href = "#BABCBFHC" xlink:show = "replace" >Section
3.5</XRef>). The XML Linking and Pointer activity tries to create a
simple but powerful linking and addressing model for <Acronym>XML</Acronym> (again,
HyTime is the ultimate system but it is too complex). The Extensible
Style Language (<Acronym>XSL</Acronym>) <XRef Linkend = "BABJHHFH"
    xlink:type="simple" xlink:href="#BABJHHFH"
     >[W3C99b]</XRef> is an effort to
create a powerful page description and transformation language (based
on <Acronym>XML</Acronym>) that can be used to transform <Acronym>XML</Acronym> documents
to screen and paper representations, for example. And there are
still others.</Para>
<Para>Having mentioned <Acronym>XSL</Acronym>, it must be said that
there are existing standards and Recommendations that do what <Acronym>XSL</Acronym> is
trying to achieve. <Acronym>DSSSL</Acronym> (see <XRef
    Linkend = "BABCBFHC" xlink:type="simple" href = "#BABCBFHC"
    xlink:show = "replace" >Section 3.5</XRef>) obviously
is up to the task, but it is too complex. Cascading Style Sheets
(<Acronym>CSS</Acronym>) <XRef Linkend = "BGBFHJDA" xlink:type="simple"
    xlink:href="#BGBFHJDA"  >[W3C96]</XRef> can
do a lot, but <Acronym>CSS</Acronym> has very limited transformation
capabilities. For example, it is not possible to create a table
of contents with a <Acronym>CSS</Acronym> stylesheet. Interactive
documents will always need scripting. The scripting language could
be for example JavaScript, or the standardized version ECMAScript <XRef
    Linkend = "BABFCCEF" xlink:type="simple"
    xlink:href="#BABFCCEF"  >[ISO98b]</XRef>.
There has been a heated debate on the need for <Acronym>XSL</Acronym><Footnote>
<Para>See the discussion forum at XML.COM, <Literal MoreInfo = "None">http://www.xml.com/</Literal>.</Para></Footnote>.</Para></Sect1>
<Sect1 Id = "BABCBFHC"><Title>HyTime</Title>
<Para>HyTime <XRef Linkend = "BGBGFAHG" xlink:type="simple"
    xlink:href="#BGBGFAHG"  >[ISO97]</XRef> is
the abbreviation for Hypermedia/Time-based Structuring Language.
In an effort to develop a standard representation for music with <Acronym>SGML</Acronym> it
was noted that <Acronym>SGML</Acronym> alone was not up to the task.
Also many things that deal with music can be used with other time-dependant
systems, such as multimedia. Hypermedia poses some difficult challenges
to the software that must operate on it <XRef Linkend = "BABIEGCG"
    xlink:type="simple" xlink:href="#BABIEGCG"
     >[Kim98]</XRef>, like how to create
non-character content, how to schedule and render real-time content,
how to locate specific objects and parts of objects within the data.
So in order to cater for a variety of different needs, HyTime was
first developed. The first version of HyTime appeared in 1994. The
current version was published in 1997.</Para>
<Sect2 Id="hcad13"><Title>Hypermedia Concepts and Dimensions</Title>
<Para>Some of the basic concepts of hypermedia are not new. There
have been cross-references in books even before print was invented.
Vannevar Bush wrote the first paper describing an automated hypertext
system <XRef Linkend = "BABDCADA" xlink:type="simple"
    xlink:href="#BABDCADA"  >[Bus45]</XRef>.
The term hypermedia is rather new, however, as it has only emerged
in the recent decades. It is worth mentioning that the term <FirstTerm>hypertext</FirstTerm> coined
by Ted Nelson in the 1960s and discussed in <XRef Linkend = "BABECJEB"
    xlink:type="simple" xlink:href="#BABECJEB"
     >[Nel82]</XRef> really means <FirstTerm>hypermedia</FirstTerm> in
its original sense.</Para>
<Para>It is possible to draw a dimension diagram for hypermedia
(see <XRef Linkend = "CIHHCJAD" xlink:type="simple"
    href = "#CIHHCJAD" xlink:show = "replace" >Figure
5</XRef>, redrawn from <XRef Linkend = "BABIEGCG" xlink:type="simple"
    xlink:href="#BABIEGCG"  >[Kim98]</XRef>). HyTime
is most applicable for the top two quadrants of the diagram (<XRef
    Linkend = "BABGAEIG" xlink:type="simple"
    xlink:href="#BABGAEIG"  >[New91]</XRef> and <XRef
    Linkend = "BABIEGCG" xlink:type="simple"
    xlink:href="#BABIEGCG"  >[Kim98]</XRef>). Still,
it has been claimed that HyTime's scope of applicability is comprehensive
enough to embrace all possible text processing applications (<XRef
    Linkend = "BABGAEIG" xlink:type="simple"
    xlink:href="#BABGAEIG"  >[New91]</XRef> and <XRef
    Linkend = "BABJBFBG" xlink:type="simple"
    xlink:href="#BABJBFBG"  >[DeR94]</XRef>).
Therefore all documents can be represented with HyTime<Footnote><Para><Citation>Resistance
is futile, you will be assimilated.</Citation></Para></Footnote>.
This sounds like a ridiculous statement, but with a bit closer look
it is not that difficult to image that this could be true. </Para>
<Figure Float = "0"><html:img src="Gradu-5.gif"/>
<Title Id = "CIHHCJAD">Dimensions of Hypermedia</Title></Figure></Sect2>
<Sect2 Id="hh14"><Title>HyTime Hyperdocuments</Title>
<Para>Every HyTime <FirstTerm>hyperdocument</FirstTerm> has a so
called <FirstTerm>hub-document</FirstTerm>, which is an SGML document with
some additional HyTime constructs. The additional HyTime constructs
can refer to external entities, like video sequences. If there is
a video sequence we would like to represent with HyTime we do not
actually need to convert it to <Acronym>SGML</Acronym>, but instead
there must be a process that can understand the video format, and
provide a <FirstTerm>grove</FirstTerm> representation of it to the
HyTime engine (all HyTime addressing and linking happens in the <Acronym>grove</Acronym>).</Para>
<Para>A <Acronym>grove</Acronym> is a formal construct from the
HyTime standard. A grove is roughly a parse tree and some additional
information. If we would like to view the video with a program that
has an embedded HyTime engine, we would open the hub-document, which
would let the HyTime engine launch the actual viewer for the video.
The grove constructor would be needed if we would need to point to
some frame in the video sequence, for example. </Para>
<Para>HyTime is a large standard, but, fortunately, highly modular.
This means that one can implement very small subsets of it. One
such module from the standard is the HyTime hyperlink module. The HyTime
hyperlinking mechanism is one of the most widely used features of
HyTime as it is relatively easy to implement on top of existing
SGML systems and it is really useful. With <Acronym>SGML</Acronym> one can
only link from an element to other uniquely identified element(s)
in the same document instance. HyTime's links do not have that
kind of restrictions.</Para></Sect2>
<Sect2 Id="hm15"><Title>HyTime Markup</Title>
<Para>Arguably the most simple HyTime link construct is the <FirstTerm>contextual
link</FirstTerm> (<Acronym>clink</Acronym>). It is a deceptively simple
yet powerful concept. <Literal MoreInfo = "None">clink</Literal> can
link to both internal locations and to external documents and locations
in them. The <FirstTerm>link initiating anchor</FirstTerm> is the
link itself, i.e. it happens in context (hence the name contextual
link). Typically the link begins as an <Acronym>SGML</Acronym> <Literal
    MoreInfo = "None">ID</Literal>/<Literal MoreInfo = "None">IDREF</Literal>,
although other forms are possible. The target <Literal MoreInfo = "None">ID</Literal> is
often just a pseudo-target in the same document to satisfy standard <Acronym>SGML</Acronym> parsers.
HyTime engines know to look more closely at the pseudo-target to see
if it is a part of a location ladder or path that eventually points
out the real target. </Para>
<Para><XRef Linkend = "CIHHHGFG" xlink:type="simple"
    href = "#CIHHHGFG" xlink:show = "replace" >Example
8</XRef> show how <Literal MoreInfo = "None">clink</Literal>s could
appear in <Acronym>SGML</Acronym> markup. Most of the <Acronym>DTD</Acronym> has
been omitted for brevity. The ellipsis (<Literal MoreInfo = "None">...</Literal>)
marks deleted sections.</Para>
<Para>The <Literal MoreInfo = "None">xref</Literal> element is a <Literal
    MoreInfo = "None">clink</Literal>. HyTime does not require the
use of specific element names. The specific HyTime constructs are
indicated by attributes. By default, the attribute name is <Literal
    MoreInfo = "None">HyTime</Literal>. The <Literal MoreInfo = "None">xref</Literal> element
has a required linkend attribute of type <Literal MoreInfo = "None">IDREF</Literal> (meaning
it must point to a unique identifier in this document). The <Literal
    MoreInfo = "None">HyTime</Literal> attribute has a fixed default
value. A fixed value means that there is no other legal values for
this attribute other than the one specified in the attribute definition.</Para>
<Para>The <Literal MoreInfo = "None">nameloc</Literal> and <Literal
    MoreInfo = "None">nmlist</Literal> elements are defined so that
they conform to similarly named constructs in the HyTime standard.
Their purpose is to enable linking to other named locations in this document
or other documents. <XRef Linkend = "CIHHHGFG" xlink:type="simple"
    href = "#CIHHHGFG" xlink:show = "replace" >Example
8</XRef> shows how to link to other documents. The <Literal
    MoreInfo = "None">nameloc</Literal> element provides a target
for the <Literal MoreInfo = "None">xref</Literal> <Literal
    MoreInfo = "None">IDREF</Literal> linkend so that <Acronym>SGML</Acronym> parsers
will find this document valid. The real target, however, is something
else as specified by the HyTime standard.</Para>
<Example Id = "CIHHHGFG"><Title>HyTime clinks</Title>
<ProgramListing Format = "linespecific">&lt;!DOCTYPE hydoc PUBLIC
"-//Heikki Toivonen//DTD My HyTime Doc//EN" [</ProgramListing>
<ProgramListing Format = "linespecific">...</ProgramListing>
<ProgramListing Format = "linespecific">&lt;!ATTLIST xref</ProgramListing>
<ProgramListing Format = "linespecific">linkend IDREF #REQUIRED</ProgramListing>
<ProgramListing Format = "linespecific">HyTime NAME #FIXED "clink"&gt;</ProgramListing>
<ProgramListing Format = "linespecific">...</ProgramListing>
<ProgramListing Format = "linespecific">&lt;!ATTLIST nameloc</ProgramListing>
<ProgramListing Format = "linespecific">id ID #REQUIRED</ProgramListing>
<ProgramListing Format = "linespecific">HyTime NAME #FIXED "nameloc"&gt;</ProgramListing>
<ProgramListing Format = "linespecific">...</ProgramListing>
<ProgramListing Format = "linespecific">&lt;!ATTLIST nmlist</ProgramListing>
<ProgramListing Format = "linespecific">docorsub ENTITY #IMPLIED</ProgramListing>
<ProgramListing Format = "linespecific">nametype (element|element)
"element"</ProgramListing>
<ProgramListing Format = "linespecific">HyTime NAME #FIXED "nmlist"&gt;</ProgramListing>
<ProgramListing Format = "linespecific">&lt;!ENTITY otherdoc SYSTEM
"otherdoc.sgm" CDATA SGML&gt;</ProgramListing>
<ProgramListing Format = "linespecific">]&gt;</ProgramListing>
<ProgramListing Format = "linespecific">&lt;hydoc id="id-1"&gt;</ProgramListing>
<ProgramListing Format = "linespecific">&lt;title&gt;My Hydoc&lt;/title&gt;</ProgramListing>
<ProgramListing Format = "linespecific">&lt;para&gt;Link to ID &lt;xref
linkend="id-1"&gt;"id-1"&lt;/xref&gt;.&lt;/para&gt;</ProgramListing>
<ProgramListing Format = "linespecific">&lt;para&gt;Link to &lt;xref
linkend="loc-1"&gt;"otherdoc.sgm"&lt;/xref&gt;.&lt;para&gt;</ProgramListing>
<ProgramListing Format = "linespecific">&lt;para&gt;Link to &lt;xref
linkend="loc2"&gt;"id-251" in "otherdoc.sgm"</ProgramListing>
<ProgramListing Format = "linespecific">&lt;/xref&gt;.&lt;/para&gt;</ProgramListing>
<ProgramListing Format = "linespecific">&lt;hylinks&gt;</ProgramListing>
<ProgramListing Format = "linespecific">&lt;nameloc id="loc-1"&gt;</ProgramListing>
<ProgramListing Format = "linespecific">&lt;nmlist nametype="entity"&gt;otherdoc&lt;/nmlist&gt;</ProgramListing>
<ProgramListing Format = "linespecific">&lt;/nameloc&gt;</ProgramListing>
<ProgramListing Format = "linespecific">&lt;nameloc id="loc-2"&gt;</ProgramListing>
<ProgramListing Format = "linespecific">&lt;nmlist docorsub="otherdoc"
nametype="element"&gt;id-251&lt;/nmlist&gt;</ProgramListing>
<ProgramListing Format = "linespecific">&lt;/nameloc&gt;</ProgramListing>
<ProgramListing Format = "linespecific">&lt;/hylinks&gt;</ProgramListing>
<ProgramListing Format = "linespecific">&lt;/hydoc&gt;</ProgramListing></Example></Sect2>
<Sect2 Id="af16"><Title>Architectural Forms</Title>
<Para>Another frequently used construct from the HyTime standard
is the <FirstTerm>architectural form</FirstTerm>, or <FirstTerm>meta-DTD</FirstTerm>.
Meta-DTDs bring object oriented thinking to document management,
i.e. document types can inherit certain properties from their ancestor <Acronym>DTD</Acronym>s.
In this regard they act a bit like base classes or supertypes in
object oriented programming languages <XRef Linkend = "BABFEEBG"
    xlink:type="simple" xlink:href="#BABFEEBG"
     >[Kim97]</XRef>. In fact, <XRef
    Linkend = "CIHHHGFG" xlink:type="simple" href = "#CIHHHGFG"
    xlink:show = "replace" >Example 8</XRef> uses the
HyTime meta-DTD, from which the <Literal MoreInfo = "None">clink</Literal>, <Literal
    MoreInfo = "None">nameloc</Literal> and <Literal MoreInfo = "None">nmlist</Literal> forms
are instantiated. </Para>
<Para>The architectural forms facility is a great help in document
management and interchange. With vanilla <Acronym>SGML</Acronym> an
author is stuck with a given <Acronym>DTD</Acronym> and can not
enhance it for his own special needs (or if he does enhance the <Acronym>DTD</Acronym>,
his documents may become unusable to other users of the original <Acronym>DTD</Acronym>).
Architectural forms allow the authors to enhance the document structure
to better suit their needs while still enabling document interchange.
The designers of the meta-DTDs can also specify certain constraints
on the derived <Acronym>DTD</Acronym>s.</Para>
<Para>Document Style Semantics and Specification Language (<Acronym>DSSSL</Acronym>) <XRef
    Linkend = "BGBIABBH" xlink:type="simple"
    xlink:href="#BGBIABBH"  >[ISO96]</XRef> is
very closely related to HyTime. They share the same grove model
of the document. <Acronym>DSSSL</Acronym> is generally recognized
as a page layout language, but because it really is a programming
language, it enables very sophisticated modification of <Acronym>SGML</Acronym>/HyTime
documents. Its query language can be used in the query location
address form of HyTime.</Para>
<Para>HyTime has gained the reputation of being a difficult and
expensive technology. While it is true that the standard in its
entirety is intimidating many bits and pieces are relatively easy
to understand and implement, and thus use. There are not too many
resources for learning HyTime, but one of the best is <XRef
    Linkend = "BABIEGCG" xlink:type="simple"
    xlink:href="#BABIEGCG"  >[Kim98]</XRef>.</Para></Sect2></Sect1>
<Sect1 Id="rs17"><Title>Related Standards</Title>
<Para>SGML is not the only standard for structured documents. Open
Document Architecture (<Acronym>ODA</Acronym>) <XRef
    Linkend = "BGBIDCCE" xlink:type="simple"
    xlink:href="#BGBIDCCE"  >[ISO89]</XRef> addresses
the same problems as <Acronym>SGML</Acronym>. <Acronym>ODA</Acronym> is
somewhat more complicated, which is probably why it has not been
as widely accepted as <Acronym>SGML</Acronym>. The main difference
between <Acronym>ODA</Acronym> and <Acronym>SGML</Acronym> is that <Acronym>ODA</Acronym> can
also be used to store the layout information of documents. Originally <Acronym>ODA</Acronym> was
called Office Document Architecture, but this was later changed
to Open Document Architecture.</Para>
<Para>Virtual Reality Markup Language (<Acronym>VRML</Acronym>) <XRef
    Linkend = "BGBGJHCD" xlink:type="simple"
    xlink:href="#BGBGJHCD"  ></XRef><XRef
    Linkend = "BGBGJHCD" xlink:type="simple"
    xlink:href="#BGBGJHCD"  >[ISO98a]</XRef> has
been developed to describe virtual reality spaces and objects. There
are also structural standards for music, chemical patterns, product information
and exchange and so on. The simplicity of <Acronym>XML</Acronym> attracts
many of these other standards, and there are investigations going
on to convert them to <Acronym>XML</Acronym> or at least provide
a mapping to XML structures.</Para>
<Para>Most documents contain illustrations. Effectively managing
images is at least as important as managing text. Therefore it is
no wonder that there are several international and de facto industry standards
available for images. It has often being said that the Computer
Graphics Metafile (<Acronym>CGM</Acronym>) <XRef Linkend = "BABIDEBC"
    xlink:type="simple" xlink:href="#BABIDEBC"
     >[ISO92b]</XRef> standard is the
equivalent of <Acronym>SGML</Acronym> for 2-dimensional images.
The drawings in this document are mostly in <Acronym>CGM</Acronym> format.</Para></Sect1></Chapter>
<Chapter Id = "CIHHCIFG"><Title>Databases</Title>
<BlockQuote><Para>We can still remember the golden days before Heisenberg,
who showed humans the walls enclosing our predestined arguments.
The lives within me find this amusing. Knowledge, you see, has no
uses without purpose, but purpose is what builds enclosing walls.</Para>
<Para><Author><Firstname>Frank</Firstname><Surname>Herbert</Surname></Author><CiteTitle xlink:type="simple" xlink:href="http://www.amazon.com/exec/obidos/ASIN/0441104029/qid=959805072/sr=1-1/103-0724774-8699018">Children
of Dune</CiteTitle></Para></BlockQuote>
<Para>Any collection of data can be considered to form some sort
of a data repository or database. Nowadays the word database is
almost exclusively reserved for an electronic collection of data
that is managed by a very special computer program, the database
management system (DBMS). The technical sources for this chapter
were mostly <XRef Linkend = "BABIDJEF" xlink:type="simple"
    xlink:href="#BABIDJEF"  >[Sun81]</XRef>, <XRef
    Linkend = "BGBEAJGD" xlink:type="simple"
    xlink:href="#BGBEAJGD"  >[Mic92]</XRef>, <XRef
    Linkend = "BABFCHCD" xlink:type="simple"
    xlink:href="#BABFCHCD"  >[Loo98]</XRef>, <XRef
    Linkend = "BABDEEDE" xlink:type="simple"
    xlink:href="#BABDEEDE"  >[Whi99]</XRef>.</Para>
<Sect1 Id="cdp18"><Title>Common Database Properties</Title>
<Para>Databases have been around for a long time, even computerized
databases have been in existence for decades now. They serve as
organized information repositories.</Para>
<Para>Databases are scalable. They are usable as small, single person
databases. A listing of one's personal video collection could
be one example of such a micro-database. Largest existing databases easily
exceed a petabyte in size. This does not mean that any database
software can be used to manage a database of arbitrary size.</Para>
<Para>The most important and useful property of databases is the
ability to perform queries on the data efficiently, filtering out
unwanted information. Queries typically execute very fast. Of course these
properties are a must in a large database, otherwise it would be
unusable. Databases can also save space by eliminating redundant
information, and there are techniques to help validate the data
in databases.</Para>
<Para>Typically databases are used in situations where the database
structure does not change, but the information in the database can
change rapidly. A common example is a bank's customer database as
the information describing a customer, or the schema, is not likely
to change over many years. Still a person may make several withdrawals
and deposits in a single day, all of which must be reflected in
the database.</Para>
<Para>The current mainstream database technology is divided into
relational and object-oriented database models (discounting models
that are obsolete or nearly so)<Footnote><Para>Discounting also
the World Wide Web, even though it is a sort of a networked database.
While WWW fills the basics of a database definition, it lacks for
example an effective querying mechanism.</Para></Footnote>. There
are some databases on the market that are referred to as <Emphasis
    Role = "B">hybrid</Emphasis> databases. That means they are
not strictly relational nor object-oriented. A typical hybrid database
evolves from a pure relational database when a variable length text
field is introduced (possibly with grammar check, for example by
an <Acronym>SGML</Acronym> parser) <XRef Linkend = "Bib-elo95.xref"
    xlink:type="simple" xlink:href="#Bib-elo95.xref"
     >[Elo95]</XRef>.</Para>
<Para>Before going futher it should be noted that the term database
used in this paper actually means the database management system
(DBMS) and the data as one entity. Most references clearly separate these
concepts and only talk about a database when the data is meant.</Para></Sect1>
<Sect1 Id = "BABEEFEB"><Title>Relational Database Model</Title>
<Para>The relational database model is based on mathematics. Besides
making it elegant by design, it has resulted in practical benefits.
For example, certain things can be proven which helps in validation
and optimization. This section will not even try to define the concepts
as they really should be defined from a mathematician's point
of view, but rather from the view of the every day user and software
designer who only needs to use databases, not design them. Moreover,
the emphasis is on data retrieval, although databases handle simultaneous
updates to data as well. On a file system simultaneous updates easily
corrupt the data.</Para>
<Para>The end of the section lists some references that will explain
the relational model more deeply and with mathematics.</Para>
<Para>It should be noted that most "relational databases" on
the market are in fact not fully relational. Dr. Edgar Codd's
("the Father of the Relational Database") work in the 1970's
and 1980's resulted in 13 rules that a relational database should
fulfill. When the relational concept gained acceptance, the database
vendors quickly implemented some properties from the relational
theory and happily sold their products as relational databases.
However, this does not imply that the databases are not well suited
for their job. And in fact, the solutions presented in this paper
demand very little from the databases used.</Para>
<Sect2 Id = "CHDCCFFF"><Title>Basic Building Blocks</Title>
<Para>In the relational model, data concerning a given entity is
collected in a table. Take, for example, a (simplified) database
describing companies, products and information about the relationships between
the companies and the products they make. The <XRef Linkend = "CIHDDDJI"
    xlink:type="simple" href = "#CIHDDDJI" xlink:show = "replace"
    >Figure 6</XRef> shows a relational database with
three tables: <Literal MoreInfo = "None">Company</Literal>, <Literal
    MoreInfo = "None">Product</Literal> and <Literal MoreInfo = "None">Manufactures</Literal>.
Tables consist of columns. For example the <Literal MoreInfo = "None">Company</Literal> table
consists of <Literal MoreInfo = "None">id</Literal>, <Literal
    MoreInfo = "None">name</Literal> and <Literal MoreInfo = "None">address</Literal> columns.
Rows in a table contain the actual data in the table, while the
tables and columns (or their names) themselves are basically metadata.</Para>
<Figure Float = "0"><html:img src="Gradu-6.gif"/>
<Title Id = "CIHDDDJI">Relational Database Model</Title></Figure>
<Para>In typical database implementations some of the columns in
a table form a so called <FirstTerm>primary key</FirstTerm>, which
must be unique for each row in a table (the relational database
theory does not require this, but this makes it easier to optimize
the database performance). In the example the <Literal MoreInfo = "None">id</Literal> column
of the <Literal MoreInfo = "None">Company</Literal> table is the
primary key. Relationships between tables can be described as links between
primary and <FirstTerm>foreign key</FirstTerm> columns. A foreign
key in a table "points to" a primary key of another table. The
relationships between companies and products in the example are
described in the table <Literal MoreInfo = "None">Manufactures</Literal>.
So to see what products can be found from a company one must first follow
the link from the <Literal MoreInfo = "None">Company</Literal> table
to <Literal MoreInfo = "None">Manufactures</Literal> and from there
to <Literal MoreInfo = "None">Product</Literal>. In this example
the company <Literal MoreInfo = "None">Berg</Literal> manufactures
only <Literal MoreInfo = "None">Bolts</Literal>.</Para>
<Para>Relationships can be <FirstTerm>one-to-one</FirstTerm>, <FirstTerm>one-to-many</FirstTerm> or <FirstTerm>many-to-many</FirstTerm>.
A one-to-one relation means that for each row in the left-hand side
table of the relationship there exists a maximum of one row in the
right-hand side table. A one-to-many relationship is simply a relation
where there can be multiple "hits" in the right-hand table.
Many-to-many relationships are created with helper tables. For example,
the <Literal MoreInfo = "None">Manufactures</Literal> table in <XRef
    Linkend = "CIHDDDJI" xlink:type="simple" href = "#CIHDDDJI"
    xlink:show = "replace" >Figure 6</XRef> is a helper
table. There is a one-to-many relationship from table <Literal
    MoreInfo = "None">Company</Literal> to it, and there is a one-to-many
relationship from table <Literal MoreInfo = "None">Product</Literal> to
it. One-to-one relationships are rare, but the other two forms of
relationships exist in most databases.</Para>
<Para>In diagrammatic models of databases relationships between
tables are sometimes indicated by lines drawn between them (entity
relationship or ER diagrams). The type of relationship is usually indicated
with numbers and other symbols. For example, one-to-many relationship
could be indicated by a "<Literal MoreInfo = "None">1</Literal>"
on the left-hand side and by "<Literal MoreInfo = "None">n</Literal>",
"<Literal MoreInfo = "None">&#8734;</Literal>" or "<Literal
    MoreInfo = "None">1..*</Literal>" on the right-hand side.
See <XRef Linkend = "CACJDEHF" xlink:type="simple"
    xlink:href="#CACJDEHF"  >Figure 18</XRef> in <XRef
    Linkend = "CACFDIDA" xlink:type="simple"
    xlink:href="#CACFDIDA"  >Section
6.3</XRef> for an example.</Para>
<Para>Relational database can also be thought to consist of different
levels, or components. On the lowest level there is the physical
storage level. The physical representation of databases is a science
in itself. Usually binary formats are used, but above the binary
level the database can be represented as sorted or unsorted indices,
various trees and other constructs. The logical level of a relational database
is the table level, or in a broader sense the query level. The upper
level is the reports, or user-interface level. Each level uses the
levels beneath it to carry out its purpose. For example, a report
has some knows how information should be shown and it can contain
complex calculations on the data. Reports use queries to fetch the
data. Eventually queries access the physical data on a storage device.
Finally there can be the management system for all of the "components".
A typical off-the-shelf database like Microsoft Access comes with
all of these.</Para>
<Para>Relational database systems have methods that try to ensure
that the data in the database will always be correct. For example
it can be made impossible to remove a company from the database without
removing the products it manufactures (and are not manufactured
by any other company). This kind of control can be achieved by checking
the primary and foreign keys, and making sure that if a key in a
table is deleted it is not referenced anywhere else.</Para></Sect2>
<Sect2 Id="n19"><Title>Normalization</Title>
<Para>It is possible to design a database badly. Bad design means
that the database contains redundant information, there are things
that cannot be queried from it and so on. The process that tries
to avoid these kinds of problems is called <FirstTerm>normalization</FirstTerm>.
Normalization is done in steps. We say that a table is in <FirstTerm>first
normal form</FirstTerm> (1NF) if all of the data values are atomic
values. Achieving 1NF is mainly common sense, and modern RDBMS make
it difficult to create tables that are not in 1NF <XRef
    Linkend = "BABDEEDE" xlink:type="simple"
    xlink:href="#BABDEEDE"  >[Whi99]</XRef>.</Para>
<Para>A table is in <FirstTerm>second normal form</FirstTerm> (2NF)
if it is already in 1NF and all non-key fields (columns) are fully
functionally dependent on the primary key. As an example of functional
dependency, consider the volume of a box, which can be calculated
from its dimensions. If the dimensions are stored in the table,
then the volume should not be stored because it is functionally
dependent on the dimensions. Stepping from a lower normal form to
a higher normal form is carried out by splitting the table.</Para>
<Para>The <FirstTerm>third normal form</FirstTerm> (3NF) is achieved
when the table is already in the 2NF and all non-key fields are
non-transiently dependent on the primary key. This simply means
that any non-key field should depend solely on the primary key.
The <Literal MoreInfo = "None">Company</Literal> table in <XRef
    Linkend = "CIHDDDJI" xlink:type="simple" href = "#CIHDDDJI"
    xlink:show = "replace" >Figure 6</XRef> would be an
example of this as long as the <Literal MoreInfo = "None">addressID</Literal> column
is always dependent on the <Literal MoreInfo = "None">address</Literal> column.
There are at least three other levels of normal form, but generally
a database in the 3NF is considered good enough <XRef
    Linkend = "BABDEEDE" xlink:type="simple"
    xlink:href="#BABDEEDE"  >[Whi99]</XRef>.
The database in <XRef Linkend = "CIHDDDJI" xlink:type="simple"
    href = "#CIHDDDJI" xlink:show = "replace" >Figure
6</XRef> is at least in the 3NF because all the conditions are fulfilled.</Para></Sect2>
<Sect2 Id="qab20"><Title>Queries And Beyond</Title>
<Para>To get data out of multiple tables or restrict what comes
out of a single table, a query is needed. Actually <Acronym>SQL</Acronym>,
the Structured Query Language <XRef Linkend = "BGBJDDAG"
    xlink:type="simple" xlink:href="#BGBJDDAG"
     >[ISO92a]</XRef>, can be used to
create and alter the database structure as well as inserting and
modifying information. Consider the example in <XRef
    Linkend = "CIHDDDJI" xlink:type="simple" href = "#CIHDDDJI"
    xlink:show = "replace" >Figure 6</XRef>. One might
want to know what products the company <Literal MoreInfo = "None">AB
Spik</Literal> has in its catalog. This could be carried out with
the <Acronym>SQL</Acronym> commands presented in <XRef
    Linkend = "BABGDEII" xlink:type="simple" href = "#BABGDEII"
    xlink:show = "replace" >Example 9</XRef>.</Para>
<Example Id = "BABGDEII"><Title>SQL <Literal MoreInfo = "None">SELECT</Literal> Statement</Title>
<ProgramListing Format = "linespecific">SELECT 
Product.name
FROM 
Company, Manufactures, Product
WHERE 
Company.name = "AB Spik" 
AND Company.id = Manufactures.fid
AND Manufactures.pid = Product.id;</ProgramListing></Example>
<Para>In English, the above query says:</Para>
<BlockQuote>This query operates on tables <Literal MoreInfo = "None">Company</Literal>, 
<Literal MoreInfo = "None">Manufactures</Literal> and <Literal
    MoreInfo = "None">Product</Literal>. Find the <Literal
    MoreInfo = "None">id</Literal> of the company whose name is 
<Literal MoreInfo = "None">AB Spik</Literal>. Next, use the found
value to locate the rows in the <Literal MoreInfo = "None">Manufactures</Literal> table
that have that value in the <Literal MoreInfo = "None">fid</Literal> column.
Finally, for all the rows found in <Literal MoreInfo = "None">Manufactures</Literal>,
find the rows in the <Literal MoreInfo = "None">Product</Literal> table
that match the <Literal MoreInfo = "None">id </Literal>column values
with the <Literal MoreInfo = "None">pid</Literal> column values
in the <Literal MoreInfo = "None">Manufactures</Literal> table, and
show their names.</BlockQuote>
<Para>Saying it aloud is quite a mouthful, but it really is quite
simple. Initially selecting the three tables produces a result set
that has all the possible combinations available in the database.
The <Literal MoreInfo = "None">WHERE</Literal> rules further restrict
the result set, because the <Literal MoreInfo = "None">AND</Literal> keyword
means that all of the rules must be true simultaneously. From the
result set a single column is selected for output.</Para>
<Para>A nice tutorial on SQL is <XRef Linkend = "BABCCDJH"
    xlink:type="simple" xlink:href="#BABCCDJH"
     >[Hof99]</XRef>. Recommended reading
about relational databases in general are <XRef Linkend = "BABDEEDE"
    xlink:type="simple" xlink:href="#BABDEEDE"
     >[Whi99]</XRef> and <XRef
    Linkend = "BABBJDIE" xlink:type="simple"
    xlink:href="#BABBJDIE"  >[Yar99]</XRef>,
the latter because it has examples using a freeware relational database. However,
the accuracy of information is not always as good as it should be.
Finally, <XRef Linkend = "BABIEGHJ" xlink:type="simple"
    xlink:href="#BABIEGHJ"  >[Loi99]</XRef> is
a book for the database freak. It goes into the mathematics of relational
databases.</Para></Sect2></Sect1>
<Sect1 Id="odm21"><Title>Other Database Models</Title>
<Para>Relational databases are not the only databases on the market.
Object-oriented databases are steadily gaining ground. Evolution
has nearly discarded some database technologies, while research
for new and better technologies goes on.</Para>
<Sect2 Id="ood22"><Title>Object Oriented Databases</Title>
<Para>Objects in an object-oriented (OO) database may have data
attributes and other objects. In fact, one definition of an object
oriented database is a <Quote>collection of persistent objects</Quote> (that
is, objects that live between invocations of a program) <XRef
    Linkend = "BABJIHID" xlink:type="simple"
    xlink:href="#BABJIHID"  >[Eck95]</XRef>.
This could be depicted as shown in <XRef Linkend = "CIHIGBJH"
    xlink:type="simple" href = "#CIHIGBJH" xlink:show = "replace"
    >Figure 7</XRef>. The outer ellipse describes
the database. The database holds objects like a circle and a box.
There are some objects that in turn consist of other objects, like
the ellipse on the left that consists of two sub-objects. The figure
is just an abstract visualization of an OO database and does not
imply that data is stored as images.</Para>
<Para>Contrary to popular beliefs, there are excellent object-oriented
databases available. OO databases are being used, and they can offer
easier handling of data and better performance, to name just a couple
of benefits over relational databases <XRef Linkend = "BABHJDBB"
    xlink:type="simple" xlink:href="#BABHJDBB"
     >[Wak99]</XRef>. OO databases are
naturally suitable for storing structured documents (<XRef
    Linkend = "BABGEADG" xlink:type="simple"
    xlink:href="#BABGEADG"  >[Paq92]</XRef>, <XRef
    Linkend = "BABIBHHA" xlink:type="simple"
    xlink:href="#BABIBHHA"  >[Böh94]</XRef> and <XRef
    Linkend = "BABJBIAH" xlink:type="simple"
    xlink:href="#BABJBIAH"  >[Bal97]</XRef>),
which is also proven by the available <Acronym>SGML</Acronym>/<Acronym>XML</Acronym> databases
using OO solutions (for example, Astoria<Footnote><Para>Astoria
is a product of Chrystal Software, a division of Xerox.</Para></Footnote> and
POET Content management Suite<Footnote><Para>POET Content Management
Suite is a product of POET Software.</Para></Footnote>).</Para>
<Figure Float = "0"><html:img src="Gradu-7.gif"/>
<Title Id = "CIHIGBJH">Object Oriented Database</Title></Figure>
<Para>OO databases also have the benefit that developers are not
faced with "impedance mismatch". Impedance mismatch is a term
used to describe the problems inherent in using SQL language from
within an OO language like C++ and binding SQL's structural constructs
to classes and objects. Using objects stored in an OO database within
an OO programming language can work with exactly the same rules
and syntax as working with any other object.</Para>
<Para>One serious handicap with OO databases is that they are not
based on such a rich mathematical foundation as the relational model.
This makes it more difficult to optimize them and present standardized
solutions. It remains to be seen if pure OO databases will enter
the mainstream. One way to get there might be to build an OO database
on top of a relational database. This has been discussed in <XRef
    Linkend = "BABFCHCD" xlink:type="simple"
    xlink:href="#BABFCHCD"  >[Loo98]</XRef> and <XRef
    Linkend = "BABIEGHJ" xlink:type="simple"
    xlink:href="#BABIEGHJ"  >[Loi99]</XRef>.</Para></Sect2>
<Sect2 Id="medm23"><Title>More Exotic Database Models</Title>
<Para>Relational and object oriented databases are certainly not
the only database technologies out there. Other technologies have
been in use before them, and new technologies are being invented.</Para>
<Para>Hybrid databases were mentioned earlier. A hybrid database
is a hybrid between relational and object oriented databases. To
make it clear that the techniques involved are relational and object oriented,
these databases are often referred to as object-relational databases.
It has been proposed that SQL be further developed to be more compatible
with object-relational databases. It has been realized that pure
relational databases are lacking in several important areas, for
example in handling structural data <XRef Linkend = "BABGCJCI"
    xlink:type="simple" xlink:href="#BABGCJCI"
     >[Rei98]</XRef>.</Para>
<Para>Hierarchical database technology (see <XRef Linkend = "CIHCBEAJ"
    xlink:type="simple" href = "#CIHCBEAJ" xlink:show = "replace"
    >Figure 8</XRef> for an illustration) has lost
the technology war to the newer technologies. On the other hand,
recent discussion in some XML forums indicates that it might again
prove to be a viable solution in some specific cases. The network
database model (see <XRef Linkend = "CIHFGHFB" xlink:type="simple"
    href = "#CIHFGHFB" xlink:show = "replace" >Figure
9</XRef>) is almost unheard of except with old mainframe computers.
References <XRef Linkend = "BABIDJEF" xlink:type="simple"
    xlink:href="#BABIDJEF"  >[Sun81]</XRef> and <XRef
    Linkend = "BABIEGHJ" xlink:type="simple"
    xlink:href="#BABIEGHJ"  >[Loi99]</XRef> discuss
hierarchical and network models in more detail.</Para>
<Para>A brief visualization of the differences between relational,
hierarchical and network models can be seen from <XRef
    Linkend = "CIHDDDJI" xlink:type="simple" href = "#CIHDDDJI"
    xlink:show = "replace" >Figure 6</XRef>, <XRef
    Linkend = "CIHCBEAJ" xlink:type="simple" href = "#CIHCBEAJ"
    xlink:show = "replace" >Figure 8</XRef> and <XRef
    Linkend = "CIHFGHFB" xlink:type="simple" href = "#CIHFGHFB"
    xlink:show = "replace" >Figure 9</XRef> (modified
samples from <XRef Linkend = "BABIDJEF" xlink:type="simple"
    xlink:href="#BABIDJEF"  >[Sun81]</XRef>).
The figures themselves do not explain the different database models,
they just show that different ways to represent the same information
exist. The sample database has information about companies and products, and
describes what company manufactures a given product.</Para>
<Figure Float = "0"><html:img src="Gradu-8.gif"/>
<Title Id = "CIHCBEAJ">Hierarchical Database Model</Title></Figure>
<Figure Float = "0"><html:img src="Gradu-9.gif"/>
<Title Id = "CIHFGHFB">Network Database Model</Title></Figure></Sect2></Sect1></Chapter>
<Chapter Id="mdwd24"><Title>Managing Documents with Databases</Title>
<BlockQuote><Para>You will learn the integrated communication methods
as you complete the next step in your mentat education. This is
a gestalten function which will overlay data paths in your awareness,
resolving complexities and masses of input from the mentat index-catalogue
techniques which you have already mastered. Your initial problem
will be the breaking tension arising from the divergent assembly
of minutiae/data on specialized subjects. Be warned. Without mentat
overlay integration, you can be immersed in the Babel Problem, which
is the label we give to the omnipresent dangers of achieving wrong
combinations from accurate information.</Para>
<Para><Author><Firstname>Frank</Firstname><Surname>Herbert</Surname></Author><CiteTitle>Children
of Dune</CiteTitle></Para></BlockQuote>
<Para>There are several ways to manage documents with databases.
The most straight-forward way is to store the documents themselves
into databases. That can be problematic, though, if the documents are
large and the system can not split the documents into smaller parts.
The other easy approach is to save documents normally, but have
databases manage references to those documents. This thesis is mostly
concerned with the latter method. The primary sources for this chapter are <XRef
    Linkend = "Bib-tra.xref" xlink:type="simple"
    xlink:href="#Bib-tra.xref" 
    >[Tra95]</XRef>, <XRef Linkend = "BGBGCJCF"
    xlink:type="simple" xlink:href="#BGBGCJCF"
     >[Pel97a]</XRef>, <XRef
    Linkend = "BGBHGHBH" xlink:type="simple"
    xlink:href="#BGBHGHBH"  >[Ryt97]</XRef> and <XRef
    Linkend = "BGBIHBEG" xlink:type="simple"
    xlink:href="#BGBIHBEG"  >[Som96]</XRef>.</Para>
<Sect1 Id="dasbdasd26"><Title>Differences And Similarities Between Databases And
Structured Documents</Title>
<Para>Documents contain information. It is important to be able
to effectively manage that information. Databases were created to
effectively manage small information blocks, like numbers, but they can
also be used to manage whole documents or parts of documents. Depending
on the database technology, the actual way of managing data differs,
but the expected functionality is pretty much the same. It must
be easy to find the information, restrict access to it, enable information
reuse, keep track of changes and so on.</Para>
<Sect2 Id="sarcsdactd27"><Title>Storing And Retrieving Complete SGML Documents - A
Challenge to Databases</Title>
<Para>It is possible to break an <Acronym>SGML</Acronym> document
into relations and tables and therefore it is possible to "insert" <Acronym>SGML</Acronym> documents
into relational databases. However, the result is probably not what
one usually calls a document. With special programs it is of course
possible to view the relational representation in a standard <Acronym>SGML</Acronym> document
way. This approach is sometimes used with very large <Acronym>SGML</Acronym> documents
that have to be managed effectively.</Para>
<Para>It is relatively easy to understand how <Acronym>SGML</Acronym> documents
can be saved in <Acronym>OO</Acronym> databases. Each element in
an <Acronym>SGML</Acronym> document is an object. Container objects
are formed of several smaller objects. It is no wonder then that
there are several <Acronym>OO</Acronym> databases specialized in <Acronym>SGML</Acronym>,
for example Astoria and POET.</Para>
<Para>Databases are very different from <Acronym>SGML</Acronym> documents.
Whereas data in an <Acronym>SGML</Acronym> file resides in sequential
order as <Acronym>ASCII</Acronym> characters, databases use several
different methods for storing the data (see <XRef Linkend = "CHDCCFFF"
    xlink:type="simple" xlink:href="#CHDCCFFF" 
    >Section 4.2.1</XRef>).</Para>
<Para>Although it would be possible to describe a database's structure
with a <Acronym>DTD</Acronym> and to display the data as a document
instance, this is not generally what is wanted. That kind of report
can be easily obtained from the database itself, even though it
is not in SGML. </Para>
<Para>With relational databases, showing the contents of the database
as SGML tables, possibly with links, would require a considerable
amount of work from the user in order to find the answers to questions
he or she might have in mind. On the other hand, queries can be
constructed so that they answer specific questions (see <XRef
    Linkend = "BABEEFEB" xlink:type="simple"
    xlink:href="#BABEEFEB"  >Section
4.2</XRef> and <XRef Linkend = "BABGDEII" xlink:type="simple"
    xlink:href="#BABGDEII"  >Example
9</XRef>). Depending on what the retrieved data is, it can be given
a logical structure with a <Acronym>DTD</Acronym>.</Para>
<Para>Also, because databases are often huge in size, building a
static <Acronym>SGML</Acronym> instance of it would take a lot of
time. Needless to say, complete databases are generally too big
to fit into <Acronym>RAM</Acronym> anyway. The way to handle this
is to build the instance as a user is browsing it, thus possibly
eliminating many needless queries. Another problem is that the data
in a database may be changing in a rapid succession, while an <Acronym>SGML</Acronym> file
normally is supposed to stay valid for some time, even decades.
So to really make an <Acronym>SGML</Acronym> instance of a database
one should ideally fetch only the information the user wants to
see and update the document continuously to reflect the changes
in the database.</Para></Sect2>
<Sect2 Id = "CHDDGDCF"><Title>Extracting Parts of Documents from
Databases</Title>
<Para>There are two basic syntactic approaches to indicate in an <Acronym>SGML</Acronym> document
instance that certain parts of the document should be retrieved
from a database. First, an entity's system identifier (see <XRef
    Linkend = "CIHGBIHD" xlink:type="simple"
    xlink:href="#CIHGBIHD"  >Section
3.3.4</XRef>) can be a database query instead of a file name. The
entity manager must then be customized to retrieve the entity's
content from a database. For example, the system identifier of the <Literal
    MoreInfo = "None">chap1</Literal> entity in <XRef
    Linkend = "CIHCBCCA" xlink:type="simple"
    xlink:href="#CIHCBCCA"  >Example
5</XRef> could be the <Acronym>SQL</Acronym> statement in <XRef
    Linkend = "BABGDEII" xlink:type="simple"
    xlink:href="#BABGDEII"  >Example
9</XRef>. The resulting generated chapter could be similar to <XRef
    Linkend = "CIHCJFCC" xlink:type="simple" href = "#CIHCJFCC"
    xlink:show = "replace" >Example 10</XRef>.</Para>
<Para>Another option is to attach a query to an element. The query
should be executed in order to get the element's content. Database
queries can be stored in attributes, or queries can exist elsewhere
and they can be referred to by HyTime links. The application processing
an element should then use the query to retrieve the element's content.
This approach is explored more in the <Acronym>DTD</Acronym> presented in <XRef
    Linkend = "Database-dtd2.xref" xlink:type="simple"
    xlink:href="#Database-dtd2.xref" 
    >Appendix B</XRef>.</Para>
<Para>The second method looks more appealing, especially with the
Second Edition of HyTime. The simple approach here is to have an
attribute that contains the query to execute. This can be expanded
to give some meaning to the attribute, i.e., tell the processing
application what the attribute value is and also make it clear that
the application should process the attribute value to get the element's
content. This can all be encoded in a standard way. This approach
can be made even more robust if the queries themselves are part
of the document. Then it would be possible to point a link to the
query wherever it is needed. This way the queries can be reused
and even managed effectively in a central repository.</Para>
<Para>Another way to classify the how the database structure can
be mapped to document structure is through the concepts of template-driven
and model-driven mappings. In a <FirstTerm>template-driven</FirstTerm> mapping
the query is embedded somewhere in the document. Once executed,
it will replace some part with a template element structure where
certain variables are replaced by database values. In the <FirstTerm>model-driven</FirstTerm> mapping
there is some fixed document structure that is mapped to the database structures.
For example, there could be a <Literal MoreInfo = "None">table</Literal> element
that must be mapped to a table in a relational database. The template-driven
and model-driven mappings are explained in more detail in <XRef
    Linkend = "BABEADBE" xlink:type="simple"
    xlink:href="#BABEADBE"  >[Bou99]</XRef>.</Para>
<Para>As it was pointed out earlier, blind mapping of the tables
and columns of a relational database to a plain table/column <Acronym>DTD</Acronym> would
not result in a meaningful document. A container element should be
mapped to a query, and the query output columns to some display
elements in the <Acronym>SGML</Acronym> <Acronym>DTD</Acronym>. But
what to do when a query returns multiple rows, as is usually the
case?</Para>
<Para>A generic way would be to define a row model so that each
row in a query's result set would generate a small piece of <Acronym>SGML</Acronym> tagged
text, each row generating the same <Acronym>SGML</Acronym> structure
but different element content. Let us take, for exampl,e a query
that returns results in two columns, say <Literal MoreInfo = "None">Author</Literal> and <Literal
    MoreInfo = "None">Book Name</Literal>. The result set contains
two rows where the first row would have values <Literal
    MoreInfo = "None">Frank Herbert</Literal> and <Literal
    MoreInfo = "None">Dune</Literal> while the second row would
have values <Literal MoreInfo = "None">Piers Anthony</Literal> and <Literal
    MoreInfo = "None">On a Pale Horse</Literal>. It might be desirable
to output the result in <Acronym>SGML</Acronym> as in <XRef
    Linkend = "CIHCJFCC" xlink:type="simple" href = "#CIHCJFCC"
    xlink:show = "replace" >Example 10</XRef>.
</Para>
<Example Id = "CIHCJFCC"><Title>Markup Generated from Database</Title>
<ProgramListing Format = "linespecific">&lt;book&gt;
&lt;author column=&quot;Author&quot;&gt;Frank Herbert&lt;/author&gt; 
&lt;title column=&quot;Book Name&quot;&gt;Dune&lt;/title&gt; 
&lt;/book&gt; 
&lt;book&gt; 
&lt;author column=&quot;Author&quot;&gt;Piers Anthony&lt;/author&gt; 
&lt;title column=&quot;Book Name&quot;&gt;On a Pale Horse&lt;/title&gt; 
&lt;/book&gt;</ProgramListing></Example></Sect2></Sect1>
<Sect1 Id="gpd27"><Title>General Purpose Databases</Title>
<Para>A general purpose database means here a database not designed
specifically to store and manage documents. There is a wide selection
of these off-the-shelf databases available.</Para>
<Para>General purpose databases cannot really do anything smart
with a large document. They must simply save it as a single large
object - if they can! The text part of a single document can exceed
50 MB in some aircraft manuals. Not all databases can handle objects
this big.</Para>
<Para>Another approach, if the database is not required or cannot
save the whole document, is to save the document on a normal file
system and only store a reference to it in the database. This is
easy from the view of the database, but it is not foolproof. Someone
could go and change the entry in the database without moving the
file in the file system or vice versa. An additional tool to manage the
reference in the database and the physical location of the file
on the file system would be a good idea. If the physical location
of the file was hidden or inaccessible to users without the additional
tool the system could be made quite safe.</Para>
<Para>If an organization is already using a relational database
to manage product data (but not product documents), it may be really
simple to modify the database to make it usable as a document management
system. For example, let us look at <XRef Linkend = "CIHDDDJI"
    xlink:type="simple" xlink:href="#CIHDDDJI" 
    >Figure 6</XRef>. If there is documentation
for each product, and we would like to add information about the
documents into the database, we would only need to add one new table,
called, in the example, <Literal MoreInfo = "None">Document</Literal>.
Let us assume that we only need to know where the document is located.
In that case we would need a filename column in the table. <XRef
    Linkend = "CACGADIF" xlink:type="simple" href = "#CACGADIF"
    xlink:show = "replace" >Figure 10</XRef> shows how
the database looks after this addition. <Literal MoreInfo = "None">Screw</Literal> now
has two documents associated with it, and <Literal MoreInfo = "None">Bolt</Literal> has
a drawing. This "reference approach" is used in the system implemented
at Wärtsilä NSD, described in the next chapter.</Para>
<Figure Float = "0"><html:img src="Gradu-10.gif"/>
<Title Id = "CACGADIF"><Literal MoreInfo = "None">Document</Literal> Table
in Relational Database</Title></Figure></Sect1>
<Sect1 Id="sd28"><Title>Specialized Databases</Title>
<Para>Specialized database in this context means a database that
is designed and implemented to store and manage documents. They
can be further divided into systems that either are or are not designed
specifically to manage structured documents.</Para>
<Para>Normal document management systems do not know that documents
could contain internal structure. They just store them as blobs
of data. It may be possible to specify that certain documents belong
together in a certain order, along with metadata such as who created
them and when, and when were they last accessed.</Para>
<Para>Version management systems commonly used in the software industry
are also a bit like document management systems. They are in some
sense aware of the contents of the files thrown at them, because
programmers like to see the differences between different versions
of files. This is often taken a bit further to optimize space, because
the system need only store the original version of a file and the
changes. This primarily makes sense with text files. Of course,
products sold as document management systems can do things like
this as well.</Para>
<Para>The most interesting document databases are the ones that
understand that documents can have structure. An SGML document can
be understood by its DTD and split into small objects stored efficiently
in the database. The document can be reassembled back into textual
representation when it is checked out.</Para>
<Para>A structured document database need not have any limitations
as to how fine-grained the logical objects it handles can be. The
pure SGML way of achieving document reuse is by using entities, but
that presents some problems, as was discussed earlier (see <XRef
    Linkend = "BABIDIHJ" xlink:type="simple"
    xlink:href="#BABIDIHJ"  >Section
3.3.3</XRef>). With a good database a user can lock a single paragraph
from a monstrously large manual, check it out, edit it, and check
it in without stopping other authors working on the same document.</Para>
<Para>Documents can be constructed from many different objects in
the database, sharing some objects. As with entities, updating a
shared object will immediately update all the documents that use
the shared object (see <XRef Linkend = "CACCECJI" xlink:type="simple"
    href = "#CACCECJI" xlink:show = "replace" >Figure
11</XRef>), but the system can warn the user that this is about
to happen and ask if this is really wanted. If the user does not
wish to update the referring documents he can instruct the database
to make a copy of the original content. Documents may either refer
to the original shared object or the new, modified shared object.</Para>
<Para>Of course, structured document databases can also save metadata
about documents such as who created what and when. Beyond this,
they are capable of taking this down to the smallest possible logical
element in the document as well. Some specialized databases also
include other components like workflow management systems which
can automatically move a job from one person to the next as soon
as work phase is completed<Footnote><Para>For example Information
Manager from Interleaf.</Para></Footnote>. All this makes special
purpose, structure-aware databases superior to other solutions.
The price tag may also far exceed other, inferior techniques. The
cost balance can therefore make less comprehensive solutions still
attractive.</Para>
<Figure Float = "0"><html:img src="Gradu-11.gif"/>
<Title Id = "CACCECJI">SGML Database</Title></Figure></Sect1>
<Sect1 Id="wmd29"><Title>Writing Modular Documents</Title>
<Para>To get the best out of a document management system where
multiple authors are editing the same document, or many documents
have a lot in common with each other, it is best if parts of documents
can be developed in relative isolation. This is common enough in
software development, for example, where a problem is analyzed and
different modules solve different parts of the whole problem. Modules
usually have well-defined interfaces to other modules which makes
it possible to develop the internals of the modules without knowledge
of the internals of the other modules. It would obviously be cost-effective
if traditional documents could be written this way as well, but,
unfortunately, manuals are not computer programs. Differences in
writing style, old habits and so on makes this very difficult.</Para>
<Para>Authors that have spent their whole career writing documents
from start to finish by themselves are faced with the difficult
transition of writing small document fragments, or micro-documents. The
term micro-document can be a bit misleading, because it means that
each document fragment describes a single component of some system,
or a procedure. A micro-document for a bolt could be very simple,
while a micro-document for a fuel system could be very large.</Para>
<Para>A micro-document should both be usable independently and it
should also be possible to combine several of them into larger publications.
This is difficult. Normal text cannot be detached from its context.
Each chapter, for example, has first some introductory material
and ends with some lead into the next chapter. This applies to the
finer structure elements as well. Each paragraph starts with an
introduction to the paragraph, and ends so that the next paragraph
logically falls into place. If an automated document assembly process
picks a piece from here and another from there, this old way of
writing documents simply does not work.</Para>
<Para>Reality imposes some constraints on how document fragments
are assembled together, of course, so writers are not left with
an impossible task. For example, it could be known that every user manual
will always have a safety section near the beginning of the document
which is followed by a tools section. Also, it could be known that
the documentation for some piece of equipment is only used in a
certain product and not in any others.</Para>
<Para>Practicality has something to say as well. A document must
naturally be readable. If an optimal solution would produce unreadable
documents, larger blocks must be written. It may also be as simple
as deciding that some documents will not be assembled automatically.</Para>
<Para>Structural documents are the natural way to write modular
documents. When an organization decides to move into the structured
document domain, there is usually a need to transform at least some
of the old documents to the new format. Besides being a rather expensive
and error prone operation, it may not be possible to split old documents
into independent microdocuments at all. The transformations involved
are outside the scope of this thesis, but they are discussed in <XRef
    Linkend = "Bib-tra.xref" xlink:type="simple"
    xlink:href="#Bib-tra.xref" 
    >[Tra95]</XRef>, for example.</Para></Sect1>
<Sect1 Id="aer30"><Title>Addressing External Resources</Title>
<Para>Managing blocks of documentation in <Acronym>SGML</Acronym> format
is generally not enough. Most documents contain figures, some of
which may need to be automatically generated at the time of printing. Static
images (or video/audio formats) can be managed almost exactly like
text documents. The difference is that they generally will be saved
as large blobs of data and not broken down into smaller parts. Because
it is more difficult to search images for certain topics, keyword
and other metadata information is often attached to the binary format
files when they are saved into a repository. For dynamic figures
and tables there must be utility processes that can generate the
data on demand.</Para>
<Para>Most technical documents include cross-references. <Acronym>SGML</Acronym>'s
limited abilities can be enhanced with other standards like HyTime,
but that does not help in the management of cross-references. For
example, how is an author supposed to make a cross-reference to
a section that is the responsibility of another writer who has not
yet begun his work? And what happens if document assembly process
selects a piece of text for insertion into a publication and there
is a cross-reference to another piece of text that is not included
in the publication? Is it possible to automatically check that all
cross-references point to valid targets?</Para>
<Para>The problems with link management warrants a thesis of its
own. And the sad answer is that there is no perfect solution. Some
rather simple techniques can be used to automate some tasks. Let's take
a closer look at the three questions we have posed above.</Para>
<Para>It turns out the easiest problem to solve is linking to content
that does not yet exist. This can be accomplished by creating a
document that contains all the targets of links from the main document.
These targets themselves redirect the link to its actual target
(see <XRef Linkend = "CACDAJDH" xlink:type="simple"
    href = "#CACDAJDH" xlink:show = "replace" >Figure
12</XRef>). This makes it possible for the first author to finish
his part of the document before other parts even exist. Of course
there must be some kind of an overview of the completed document
so that it is possible to insert cross-references pointing to unfinished
sections. During authoring, it does not really matter if the actual
target does not yet exist because the intermediate document contains
the information that some parts are missing and must be created
before publication.</Para>
<Figure Float = "0"><html:img src="Gradu-12.gif"/>
<Title Id = "CACDAJDH">Authoring with an Intermediate Link Document</Title></Figure>
<Para>The next problem is more difficult. If an automated document
assembly process drops cross-reference targets off the publication,
something must be done with the links or the document could become
invalid or unusable. The simple approach is to use the intermediate
link target file in this case as well. Links that point to non-existing
targets simply point to information that explains that the link
target is missing and where the missing part might be found. If
the target is known to be in some other publication, the link can
be replaced with a bibliographic reference to the other publication.
In some simple cases cross-references can be deleted.</Para>
<Para>The most difficult problem is making sure that links point
to where they are supposed to point. Links that point to relative
locations may be confused by changing the document structure. For example,
if a link points to the third paragraph of this section, deleting
the current first paragraph changes the link target. The link might
still be valid, or it might not. Although automatic processes can
be created to check that every link points to something, a human
is needed to make sure that the relationship is real and correct.
There is no artificial intelligence system yet that could follow
links and read or otherwise experience what is at the other end
and reason out if the link was valid. It seems absurd that highly
paid professionals sit for days at computer terminals clicking links
and seeing where they lead to. But there is just no better way.</Para>
<Para>Problems and solutions in linking have been explored in <XRef
    Linkend = "BABIEGCG" xlink:type="simple"
    xlink:href="#BABIEGCG"  >[Kim98]</XRef> and
in summary format in <XRef Linkend = "BABIFBAG" xlink:type="simple"
    xlink:href="#BABIFBAG"  >[Ang97]</XRef>.</Para></Sect1>
<Sect1 Id="csa31"><Title>Client-Server Architecture</Title>
<Para>In a document management system, as described in this paper,
there is a database and possibly some kind of file server. These
are server-level components. The clients in a typical document management
system will include editors or other data producers and viewer and
publishing applications. A sample architecture is presented in <XRef
    Linkend = "CACGEICE" xlink:type="simple" href = "#CACGEICE"
    xlink:show = "replace" >Figure 13</XRef>.</Para>
<Para>The client-server architecture model is very common way to
design system architectures. It is a distributed system model that
scales well from the needs of a single user to gigantic proportions. For
example, the world wide web is based on the client-server model
where clients (browsers) talk to web servers.</Para>
<Para>The architecture consists of three major components: server,
client and network. The network component is optional. The server
and client need to communicate with a predefined protocol. Changing
the protocol at one end requires changes at the other end. Clients
will also need to find the servers they are interested in.</Para>
<Para>The processing of information can occur centrally at the server,
or the work can be divided between the server and clients. Generally
it is better to have the server doing the bulk of the work because
it is easier to maintain a few servers than several clients. However,
as the number of clients increases, the server capacity and, in
all likelyhood, the network capacity will need to be updated. The
capacity of a single server can be improved by replacing it with
a cluster of servers. The client continues to communicate with the
cluster as though it were a single server. Inside the cluster the
work load is balanced between different computers. This also improves
fault tolerance, because failure in one computer does not render
the cluster unusable. Using multiple servers, either in a cluster
or more indepedently presents the problem that whenever data changes
at one server it may need to pass this information along to the
other servers.</Para>
<Figure Float = "0"><html:img src="Gradu-13.gif"/>
<Title Id = "CACGEICE">Client-Server Architecture for Document Management
System</Title></Figure></Sect1>
<Sect1 Id="bs32"><Title>Background Summary</Title>
<Para>This and the previous chapters have built the background knowledge
needed to understand the practical part of this thesis. Product
management, structured documents and databases have been explained
with their strengths, weaknesses and practical considerations. This
chapter in particular has shown what is involved in product document
management when it is done with structured documents and databases,
emphasis on relational database technology.</Para>
<Para>The following chapters introduce a real life product document
management system using relational databases and SGML to store and
manage product documentation. The system was implemented at Wärtsilä
NSD by Citec Engineering Oy. The implementation does not use all
of the features presented in this and previous chapters, but all
of the features have been considered at some point. Other, similar
systems have been described in <XRef Linkend = "Bib-tra.xref"
    xlink:type="simple" xlink:href="#Bib-tra.xref"
     >[Tra95]</XRef> and <XRef
    Linkend = "BABFCHCD" xlink:type="simple"
    xlink:href="#BABFCHCD"  >[Loo98]</XRef>.
The former briefly describes an SGML document management system
using relational databases and the latter an image database implemented
with the help of object-oriented databases.</Para></Sect1></Chapter>
<Chapter Id = "CACDCHGA"><Title>Product Information Management Project at Wärtsilä NSD Power
Plants</Title>
<BlockQuote><Para>The Duncans sometimes ask if I understand the
exotic ideas of our past? And if I understand them, why can't I
explain them? Knowledge, the Duncans believe, resides only in particulars.
I try to tell them that all words are plastic. Word images begin
to distort in the instant of utterance. Ideas embedded in a language
require that particular language for expression. This is the very
essence of the meaning within the word exotic. See how it begins
to distort? Translation squirms in the presence of the exotic. The
Galach which I speak here imposes itself. It is an outside frame
of reference, a particular system. Dangers lurk in all systems.
Systems incorporate the unexamined beliefs of their creators. Adopt
a system, accept its beliefs, and you help strengthen the resistance
to change. Does it serve any purpose for me to tell the Duncans
that there are no languages for some things? Ahhhh! But the Duncans
believe that all languages are mine.</Para>
<Para><Author><Firstname>Frank</Firstname><Surname>Herbert</Surname></Author><CiteTitle xlink:type="simple" xlink:href="http://www.amazon.com/exec/obidos/ASIN/0441294677/ref=sim_books/103-0724774-8699018">God
Emperor of Dune</CiteTitle></Para></BlockQuote>
<Para>Wärtsilä NSD (former Wärtsilä Diesel) is a Finnish engineering
group with global operations. It is the leading supplier of power
solutions for both land and sea. The gas and oil-fired power plant solutions
range from 1 MW to 400 MW and are used for base load, co-generation,
load management and gas compressor applications. The deliveries
include turnkey construction and long term maintenance and operation.</Para>
<Para>Wärtsilä NSD Power Plants has about 300 active subcontractors,
and has had over 8,000 different suppliers since 1982 <XRef
    Linkend = "BGBGCJCF" xlink:type="simple"
    xlink:href="#BGBGCJCF"  >[Pel97a]</XRef>.
Those active subcontractors are involved in over 100 power plant projects
a year. These subcontractors are required to deliver to Wärtsilä
documentation along with their products. The many different systems
utilized by the subcontractors led to problems at Wärtsilä, and
it was determined that imposing documentation standards on subcontractors
would alleviate those problems. The Product Information Management
(PIM) project was started to sort out the different problems and
implement a solution. PIM was started out in 1995, while the majority of
the work in the project was done during 1996 and 1997.</Para>
<Para>This chapter describes the overall PIM project and the project
phases. The next two chapters describe two software products developed
as part of the project in more detail. The author developed the
second tool (see <XRef Linkend = "CEGHDCDG" xlink:type="simple"
    xlink:href="#CEGHDCDG"  >Chapter
8</XRef>) as the practical programming work in this thesis.</Para>
<Sect1 Id="aptsard32"><Title>Analysis Pointed to SGML And Relational Databases</Title>
<Para>It was determined that the different file format problems
could be solved with <Acronym>SGML</Acronym>, so it was decided
that all the subcontractors should supply the textual information
in <Acronym>SGML</Acronym>. At that time SGML was still new in Finland:
it was a rather radical solution for the time. The annual SGML Finland
conferences had not even started yet, the first conference being
held in 1996.</Para>
<Para>Wärtsilä was using Oracle relational databases internally,
as were many of its larger subcontractors. Microsoft's Open Database
Connectivity (ODBC) <XRef Linkend = "BGBEAJGD" xlink:type="simple"
    xlink:href="#BGBEAJGD"  >[Mic92]</XRef> technology
provided a strong reason to continue to favor relational technology
as it allows applications to communicate with any ODBC-enabled database.
SGML databases at the time were very expensive and not suitable for
Wärtsilä's needs because the subcontractors were creating the
documentation and they needed the document management system as
well. Moreover, the PIM system was to store traditional product
data in addition to being a document management system. It was seen
that a new database schema would be needed to best utilize the system.</Para>
<Para>Content production was the next challenge, because most subcontractors
were not using SGML internally. A survey was conducted among the
subcontractors, which showed that the majority were using Microsoft
products. To reduce the costs for subcontractors, and to make it
easier for them to accept the movement to structured documents,
a custom <Acronym>SGML</Acronym> authoring tool built around the
Microsoft Word program was seen as the solution. The users would
still be using the familiar Word program, there would just be new
buttons and menu items. The authoring tool could be used to write
SGML documents and at the same time, keep the document database
up-to-date, and edit some database fields directly.</Para>
<Para>The final piece was the viewing and publishing tools. There
were no good and cheap SGML viewers, not to mention specialized
publishing packages, available and this meant that they would need
to be created. In addition to being able to view and format SGML
documents for display and printing, the programs would need to be
able to communicate with the document repository to assemble larger
works from the stored microdocuments.</Para></Sect1>
<Sect1 Id="ras33"><Title>Requirements And Specification</Title>
<Para>The PIM system requirements were loosely defined. Because
the basic problem was that documents could not be produced in time,
the main goal was to speed up the documentation processes. The use
of SGML was seen as critical aspect to achieve this. For example,
the authors would not need to spend time specifying styles, all
necessary parts of documents would be created in the expected order
and document assembly could be largely automated. It would also
be easier to reuse information. The added benefit would, of course,
be that document quality would improve.</Para>
<Para>Documentation is created at the same time as the new equipment
it documents is manufactured. Occasionally snapshots of the current
documentation are requested. It was expected that it would be easier
to provide snapshots with the new system, and locate pieces of documentation
that were not yet finished.</Para>
<Para>When Wärtsilä delivers a power plant, dozens - maybe hundreds
of thick binder manuals are shipped to the customer. This takes
a lot of space, is difficult to transport and is often difficult
to get through customs procedures in several countries. Changes
in documents can also require a lot of time to actually end up at
the customer's site. It was hoped that eventually paper manuals
could be abandoned. Only CD-ROMs carrying the SGML files would have
to be shipped to customers, and changes could either be shipped
with more CD-ROMs or email.</Para>
<Para>SGML is a neutral data format in that it does not specify
how it should be formatted, or even on what display system it should
be displayed. It does not even need to be displayed at all, but
could be read, or only manipulated by computer programs. This neutrality
was also seen as important by Wärtsilä, because they could produce
information in various formats generated from a central SGML source.</Para>
<Para>Software specifications were written first for the editor
and database parts of the system, as well as for the plain SGML
viewer program that would be used as the basis for the publishing
tool. It could be argued that the specifications were no more than
software definitions because they did not go into great detail about
how the system should be implemented. For example, the documents
stated that relational databases were to be used with the Open Database
Connectivity interface, but the documents did not describe many
of the dialogs that would be presented to the user nor were there
detailed speed or memory usage requirements. The documents were
mostly functional descriptions approved by the customer (Wärtsilä).</Para>
<Para>The natural development process for this project turned out
to be evolutionary development. It was exploratory programming in
the sense that the developers had to work with Wärtsilä to find out
what the final system should be because of the loose definitions.
It was exploratory programming also because the developers had to
learn how to use some of the system components.</Para></Sect1>
<Sect1 Id = "CACFDIDA"><Title> Design and Architectrure</Title>
<Para>The analysis phase had identified the key components in the
PIM system. Thus the design process for the overall architecture
and system data structrure design was relatively straightforward.</Para>
<Sect2 Id="ad35"><Title>Architectural Design</Title>
<Para>The client-server model architecture was the natural choice.
There is a server that hosts the relational database. The server
also acts as a file server. The WNS Author Tool (described in <XRef
    Linkend = "CEGJCJIG" xlink:type="simple"
    xlink:href="#CEGJCJIG"  >Chapter 7</XRef>)
is used to create content and is one kind of client application.
The publishing tool (see <XRef Linkend = "CEGHDCDG" xlink:type="simple"
    xlink:href="#CEGHDCDG"  >Chapter 8</XRef>)
is another client application. It is used to publish different information
products (and "assemble products") from the data repository
managed at the server. The server and different clients could be
located on the same computer. In fact, subcontractors would almost
certainly have a mini-server sitting in their PCs. The mini-server
would have only a part of the information contained in the main
server located at Wärtsilä.</Para>
<Para>The overall system architecture can be seen in <XRef
    Linkend = "CACHGFAA" xlink:type="simple" href = "#CACHGFAA"
    xlink:show = "replace" >Figure 15</XRef> (see a reduced
view in <XRef Linkend = "CACBGHAH" xlink:type="simple"
    href = "#CACBGHAH" xlink:show = "replace" >Figure
14</XRef>). Creation of Content represents the WNS Author Tool,
Document Management System is the server and Product Assembly and
Formatting Application is the publishing tool (Multidoc Pro Database Publisher)
that was developed as the practical work for this thesis by the
author.</Para>
<Para>There was never any question as to the database technology
to use. Wärtsilä had all the product data in an Oracle (relational)
database. The system was to be built around it. However, because the
relational database stores only references to files the reference
could be to an SGML document held in a special purpose SGML database.</Para>
<Para>The subcontractors were to <Acronym>transfer the files using
FTP</Acronym> to Wärtsilä, including their light version <Acronym>LSAR</Acronym> (see <XRef
    Linkend = "CHDBECEA" xlink:type="simple" href = "#CHDBECEA"
    xlink:show = "replace" >Section 6.3.3</XRef>). A workflow
system would initiate a workflow upon delivery of the files to the
Wärtsilä server. The files were to be decrypted and scanned for
viruses, after which they should go through the approval processes
at Wärtsilä. Documents that were not approved would be returned
to subcontractors for more work while approved information would
be saved to the main data repository at Wärtsilä. Tight integration
of the database and other system components was part of the overall
vision, but it was not planned for initial implementation.</Para>
<Figure Float = "0"><html:img src="Gradu-14.gif"/>
<Title Id = "CACBGHAH">Simplified PIM System Architecture</Title></Figure>
<Figure Float = "0"><html:img src="assembly.gif"/>
<Title Id = "CACHGFAA">System Architecture</Title></Figure></Sect2>
<Sect2 Id="dd36"><Title>DTD Design</Title>
<Para>System-level data structure design involved designing SGML
DTDs and the database schema. Citec<Footnote><Para>Citec is the
largest SGML service provider in Finland. The company homepage is <Literal
    MoreInfo = "None">http://www.citec.fi</Literal>. The author
was hired by Citec in the summer of 1996.</Para></Footnote>, subcontracted
by Wärtsilä to design and implement the whole system, developed
eight different document types for the documents needed at power
plants. The FMV DTD (designed for and used by the Swedish military
and subcontractors) was used as the model for these DTDs <XRef
    Linkend = "BABCAHEE" xlink:type="simple"
    xlink:href="#BABCAHEE"  >[FMV95]</XRef>.
FMV is content-oriented (i.e., it uses logical element names that
describe what the data is as opposed to structural names like chapter
and section), which was exactly what Wärtsilä wanted. The full
FMV DTD has a lot more "branches" than the eight that were selected
and modified for Wärtsilä, but it was decided to start simple
and later, if needed, integrate the rest of the FMV DTD.</Para>
<Para>The eight DTDs that were designed are: system, function, operation,
corrective maintenance, periodic maintenance, technical data, faulfinding
and spare parts. <XRef Linkend = "CACBBDAI" xlink:type="simple"
    href = "#CACBBDAI" xlink:show = "replace" >Figure
16</XRef> shows one of them, the spare parts DTD in tree view. The
DTDs are documented in <XRef Linkend = "BABHEEAA" xlink:type="simple"
    xlink:href="#BABHEEAA"  >[CIT97b]</XRef>.</Para>
<Figure Float = "0"><html:img src="sparepart1.gif"/>
<Title Id = "CACBBDAI">Spare Parts DTD <XRef Linkend = "BABHEEAA"
    xlink:type="simple" xlink:href="#BABHEEAA"
     >[CIT97b]</XRef></Title></Figure>
<Para>If and when these DTDs get revised, it will be a lot easier
to migrate the old data to conform to the new specifications relative
to the old situation where information was not in a standardized
format. Still, transformation is not a trivial problem (see for
example <XRef Linkend = "Bib-lin.xref" xlink:type="simple"
    xlink:href="#Bib-lin.xref" 
    >[Lin97]</XRef>).</Para></Sect2>
<Sect2 Id = "CHDBECEA"><Title>Database Design</Title>
<Para>A logistics support database (<Acronym>LSAR</Acronym>) -
or Equipment Breakdown Structure (EBS) as it is described in the
reference - holds the logistics information about different components
and their documentation. Images in various predefined formats are
also managed by the database. The <Acronym>LSAR</Acronym> is a normal
relational database. The authoring tool creates <Acronym>SGML</Acronym> files
and keeps the local <Acronym>LSAR</Acronym> database up-to-date.
EBS is like the logical view of the LSAR database. <XRef
    Linkend = "CACCDIBJ" xlink:type="simple" href = "#CACCDIBJ"
    xlink:show = "replace" >Figure 17</XRef> shows the <Acronym>EBS
view</Acronym>.</Para>
<Para>The final LSAR database schema is shown in <XRef
    Linkend = "CACJDEHF" xlink:type="simple" href = "#CACJDEHF"
    xlink:show = "replace" >Figure 18</XRef>. Because
of the tree-like structure of the EBS, the database also simulates
a tree-like structure. This is achieved with the <Literal
    MoreInfo = "None">Parent ID</Literal> column in the <Literal
    MoreInfo = "None">Structures</Literal> table and the <Literal
    MoreInfo = "None">Parent Code</Literal> column in the <Literal
    MoreInfo = "None">PPS Codes</Literal> table. For example, in
the EBS we can see that a Fuel System consists of at least Oil Heater
and Ball Valve. Those components must have Fuel System as their
parent. The components shown in the EBS are generic types of components
and they are listed in the <Literal MoreInfo = "None">PPS Codes</Literal> table.
That, and other tables, are explained below.</Para>
<Para>The <Literal MoreInfo = "None">Structures</Literal> table
holds information about each individual structural component in
a power plant. <Literal MoreInfo = "None">Parts</Literal> contains
information about spare parts, and the <Literal MoreInfo = "None">Documents</Literal> table
stores information about documents. Other tables are more or less
auxiliary tables that were created while optimizing the database.
An earlier version of the database can be seen in <XRef
    Linkend = "CEGCFDEI" xlink:type="simple"
    xlink:href="#CEGCFDEI"  >Figure
32</XRef>.</Para>
<Para>The <Literal MoreInfo = "None">PPS Codes</Literal> table lists
the generic types of components in a power plant. Each generic component
can have documentation associated with it, via the <Literal
    MoreInfo = "None">PPS Documentation</Literal> table. The <Literal
    MoreInfo = "None">Serial Number</Literal> column in the <Literal
    MoreInfo = "None">Structures</Literal> table shows that it is
used to track additional information about each manufactured component.
Thus, there can be different documents for the same kind of fuel
pump, one of which is installed in a power plant in Beijing and
one in Ankara, for example. The <Literal MoreInfo = "None">Structures</Literal> table
connects to the <Literal MoreInfo = "None">Documents</Literal> table
via the <Literal MoreInfo = "None">Structure Documentation</Literal> table.
Of course, there must also be a relationship with the <Literal
    MoreInfo = "None">PPS Codes</Literal> and <Literal MoreInfo = "None">Structures</Literal> tables,
because each individual component is always of some generic type
of a component.</Para>
<Para>There are many components in a power plant that need maintenance
after a certain amount of time. Obviously there must be documents
that describe how these maintenance operations are to be carried
out. The <Literal MoreInfo = "None">Per Maint Intervals</Literal> table
is a helper table that makes it possible, for example, to search
documents for maintenace operations held every 500 hours.</Para>
<Para>The LSAR database is at least in the third normal form. It
has not been checked to see if it would qualify for more advanced
normal forms. </Para>
<Figure Float = "0"><html:img src="I_struct.gif"/>
<Title Id = "CACCDIBJ">Power Plant Equipment Breakdown Structure</Title></Figure>
<Figure Float = "0"><html:img src="final-lsar.gif"/>
<Title Id = "CACJDEHF">LSAR Schema</Title></Figure></Sect2></Sect1>
<BeginPage/>
<Sect1 Id="vav37"><Title>Verification and Validation</Title>
<Para>No metrics were designed to measure the success or failure
of the delivered system. Because of the loose definitions it would
have been quite difficult to measure anything. In retrospect, it
can be seen that at least one thing could be clearly measured: what
percentage of documentation projects are completed on time. Additionally,
user satisfaction could be measured with interviews. As the system
has many different user groups, ranging from authors to end product
users, several group satisfaction profiles could be created. In
fact, there were was a preliminary plan to observe the system and
its users in a controlled environment. There was an attempt to get
external funding for this testing, but it did not work out and the
plan was abandoned.</Para>
<Para>Although the tools more or less did what they were supposed
to do, the overall project failed at Wärtsilä. There were several
reasons. The transition to SGML was not properly orchestrated. The project
did not gain company-wide acceptance. It might have helped if the
tools would have exceeded expectations, but they took longer to
develop than anticipated and were not as good and easy to deploy
as was hoped.</Para>
<Para>In retrospect it is quite easy to see what should have been
done differently. The whole project at Wärtsilä should have been
handled differently. Acceptance at all levels is crucial for a transition to
SGML to succeed, and more people should have been involved. The
SGML project at Wärtsilä did not get any extra resources to carry
out the testing and integration of the new system. The authors and
editors were expected to deliver information products using the
existing system and simultaneously install the new system, evaluate
it, provide feedback to Citec and begin using the new system in
earnest.</Para>
<Para>After evaluation of the Microsoft Word-based editor the vision
should have been re-evaluated. The originally chosen solutions were
proved limited. The correct decision would have been to make a fresh
start with a native SGML editor (for more details see <XRef
    Linkend = "CHDDEAHF" xlink:type="simple"
    xlink:href="#CHDDEAHF"  >Section
7.2</XRef>).</Para>
<Para>It was also learned that the subcontractors were not ready
to move into SGML. The jump to structural information was simply
too great, even with all the preparation and tools provided.</Para>
<Para>Wärtsilä and Citec have evaluated the system using information
prepared for the Wärtsilä Pilot Power Plant located in Vaasa.
Other small scale testing and use has been carried out by Wärtsilä personnel.
Perhaps the biggest value in the project has been to gain knowledge
of SGML and everything involved. The current SGML documentation
projects are using the knowledge gathered from this project, and
they seem to be faring better.</Para></Sect1></Chapter>
<Chapter Id = "CEGJCJIG"><Title>Document Authoring</Title>
<BlockQuote><Para>AXIS (Biologic Band 4)&gt; Hello, Roger. I assume
you're still there. This distance is a challenge even for me, based
as I am upon human templates... [politeness algorithm diagnosis
for total mechanic-biologic thinker function V-optimal] most of
the time. I have come within a million kilometers of B-2 mark this
moment 7-23-2043-1205:15. I am preparing my machine and bio memories
for receipt of information from the children, now flying in a perfectly
dispersing cloud toward B-2. Data on B-3 have been relayed. The
planet, you can see, is quite Jovian, very pretty, though tending
towards the greens and yellows rather than reds and browns. I'm
enjoying the extra energy from B's light: it allows me to get
some work done that I've been delaying for some time, opening
up regions of memory and thought I've closed down during the cold
and dark. I've just completed a self analysis; as you doubtless
have discovered by checking my politeness algorithm diagnostic.
I am V-optimal. I am not using the formal "I"; the joke about
self awareness still does not make any sense to me.</Para>
<Para><Author><Firstname>Greg</Firstname><Surname>Bear</Surname></Author><CiteTitle xlink:type="simple" xlink:href="http://www.amazon.com/exec/obidos/ASIN/0446361305/qid=959805270/sr=1-1/103-0724774-8699018">Queen
of Angels</CiteTitle></Para></BlockQuote>
<Para>There are several good SGML editors on the market, for example <ProductName
    Class = "Trade">Adeptâ€¢Editor</ProductName>. Most of them are quite
expensive, and require mastering operation concepts which have little
in common with other types of software. Obviously, they are general
purpose tools. The requirements for the Wärtsilä project made
the development of a custom authoring tool necessary. The primary
source of information for this chapter was <XRef Linkend = "BABHEEAA"
    xlink:type="simple" xlink:href="#BABHEEAA"
     >[CIT97b]</XRef>.</Para>
<Sect1 Id="at38"><Title>Authoring Tool</Title>
<Para>The WNS Author Tool created by Citec is used to <Quote>enter
database records and write corresponding SGML documents according
to the WNS Base-DTD</Quote>. Simply put,this means that by selecting a
component from the Equipment Breakdown Structure (EBS) (see <XRef
    Linkend = "CACCDIBJ" xlink:type="simple"
    xlink:href="#CACCDIBJ"  >Figure
17</XRef> in <XRef Linkend = "CACFDIDA" xlink:type="simple"
    xlink:href="#CACFDIDA"  >Section
6.3</XRef>) one can write documentation for that specific component.
The <Acronym>EBS</Acronym> information is in the <Acronym>LSAR</Acronym> database.
Information about the documentation modules are also saved into
the <Acronym>LSAR</Acronym> database.</Para>
<Para>The authoring tool works with Microsoft Word 6. SGML Author
for Word 1.0 is also needed, along with Microsoft Access (database)
Driver.</Para>
<Para>A new document is created by selecting <Literal MoreInfo = "None">New</Literal> command
from the <Literal MoreInfo = "None">File</Literal> menu and selecting
the <Acronym>LSAR</Acronym> template. The default view is presented
in <XRef Linkend = "CEGIDDHB" xlink:type="simple"
    href = "#CEGIDDHB" xlink:show = "replace" >Figure
19</XRef>. In the figure the area marked with 1 on the left contains
the editing fields. Area 2 contains the number and the type of information modules
contained in the record. Scrolling buttons are located in Area 3,
and Area 4 is the database display field. The next step is connecting
to the <Acronym>LSAR</Acronym> database. This is accomplished simply by
double clicking the connection button (not visible in the figure),
and selecting the <Acronym>LSAR</Acronym> data source.</Para>
<Para>Changing the context in the <Acronym>EBS</Acronym> is accomplished
via the context selection dialog (see <XRef Linkend = "CEGEHEBJ"
    xlink:type="simple" href = "#CEGEHEBJ" xlink:show = "replace"
    >Figure 20</XRef>). Editing fields are filled
from drop down lists (see <XRef Linkend = "CEGGCCGI"
    xlink:type="simple" href = "#CEGGCCGI" xlink:show = "replace"
    >Figure 21</XRef>). Information modules are
attached by double clicking one of the eight module buttons (not
visible in the figures), for example, <Literal MoreInfo = "None">Operation</Literal>.</Para>
<Para>Information modules are written, more or less, as normal documents
with Word. The SGML structure must be generated in the import and
export phases from style information because Word does not handle
it natively. Therefore, it is very important that only the styles
defined in the <Acronym>LSAR</Acronym> and information module templates
be used. A sample information document can be seen in <XRef
    Linkend = "CEGIEADH" xlink:type="simple" href = "#CEGIEADH"
    xlink:show = "replace" >Figure 22</XRef>. The styles
from which the <Acronym>SGML</Acronym> information will be generated
are visible on the left.</Para>
<Figure Float = "0"><ScreenShot><html:img src="bild1.gif"/></ScreenShot>
<Title Id = "CEGIDDHB">The LSAR Interface</Title></Figure>
<Figure Float = "0"><ScreenShot><html:img src="figur1.gif"/></ScreenShot>
<Title Id = "CEGEHEBJ">Selecting Context from the Equipment Breakdown
Structure</Title></Figure>
<Figure Float = "0"><ScreenShot><html:img src="bild3.gif"/></ScreenShot>
<Title Id = "CEGGCCGI">System Field Drop Down Menu</Title></Figure>
<Figure Float = "0"><ScreenShot><html:img src="sample2.gif"/></ScreenShot>
<Title Id = "CEGIEADH">Sample Information Module in WNS Author Tool</Title></Figure></Sect1>
<Sect1 Id = "CHDDEAHF"><Title>Implementation of the WNS Authoring Tool</Title>
<Para>The first implementation work in the PIM project begun with
the development of the authoring tool. The first specifications
were too optimistic; a functioning editor was expected in less than
six months. In reality, it took well over a year to ship the final
version, although not all of that time was spent on the authoring
tool development alone. There was one developer allocated to working with
the authoring tool and the LSAR database schema.</Para>
<Para>The WNS Authoring Tool was implemented in Microsoft Word Basic.
This presented some serious difficulties in programming. For example,
the Word Basic application programming language only allows function
calls to nest five levels deep in other macros.</Para>
<Para>The authoring tool and the LSAR database were tested by Citec
and Wärtsilä by trying to create documentation with it. There
were no software tools used in the testing. There was not really
any viable software available that could have been used, apart from
test suites that actually run the program and record inconsistencies
between runs. This kind of software is expensive, and was not seen
to be cost effective.</Para>
<Para>The biggest problem with the Authoring Tool is that it is
limited to a specific version of Microsoft Word and SGML Author
for Word. As newer versions of Word emerged, users of the Authoring Tool
were locked to version 6, because it is, for all practical purposes,
impossible to use multiple versions of Word on the same computer.</Para>
<Para>As soon as this was realized plans for improvement were prepared.
In these plans the parts that offer connection to databases and
the specialized knowledge about the eight <Acronym>DTD</Acronym>s
would be coded into a generic <Acronym>DLL</Acronym>. After that,
any <Acronym>SGML</Acronym> editor could be customized to write
power plant documentation by coding simple hooks from those applications
into this <Acronym>DLL</Acronym>. The plan was never carried out
because the whole SGML project suffered problems at Wärtsilä.</Para>
<Para>An additional problem with the SGML Author for Word was licensing.
It was not clear who owned the product and who could give licenses.</Para>
<Para>Citec's experience with other customized SGML editors suggest
that it would not be a big risk to choose a native SGML editor like
Grif<Footnote><Para>Grif SGML Editor is a product of Infrastructures
for Information. Citec has created a customized SGML editor from
Grif for the offshore industry (<Literal MoreInfo = "None">http://www.citec.fi/company/products/toolbox.html</Literal>).</Para></Footnote> over
a mixed implementation like SGML Author for Word. The benefits of
native SGML will come with a more reliable environment and faster
applications. Additionally, native SGML editors often have a rich
API with the possibility of using real programming languages to
implement additional packages. These outweight the initial, minor
difficulties the users are faced with while learning a new program.</Para></Sect1></Chapter>
<Chapter Id = "CEGHDCDG"><Title>Document Assembly</Title>
<BlockQuote><Para>Inaccuracy. We did not destroy those portions
of your organic brain. We borrowed/took/expropriated a few grams
of tissue for use in a great goal. Our need was greater than yours'.</Para>
<Para><Author><Firstname>David</Firstname><Surname>Brin</Surname></Author><CiteTitle xlink:type="simple" xlink:href="http://www.amazon.com/exec/obidos/ASIN/0553574736/qid=959805326/sr=1-1/103-0724774-8699018">Heaven's
Reach</CiteTitle></Para></BlockQuote>
<Para>The practical work in this thesis was concentrated into the
area of document assembly. See <XRef Linkend = "CACHGFAA"
    xlink:type="simple" xlink:href="#CACHGFAA" 
    >Figure 15</XRef> for how this relates to the
Product Information Management project at Wärtsilä.</Para>
<Para>The work included expanding the functionality of the Multidoc
Pro (<Acronym>MDP</Acronym>) SGML Browser so that it could be used
to connect to relational databases and create compound documents
from several document fragments managed by databases. In the Wärtsilä
project the document fragments were authored with the tool presented
in the previous chapter, but the use of that particular tool is not
a requirement.</Para>
<Para>Additionally, Multidoc Pro was enhanced so that it could view
relational databases as if they were <Acronym>SGML</Acronym> documents.</Para>
<BeginPage/>
<Sect1 Id="mpst39"><Title>Multidoc Pro SGML Tools</Title>
<Para>Multidoc Pro<Footnote><Para>Multidoc Pro is a registered trademark
of Citec Engineering Oy.</Para></Footnote> is the brand name for
a variety of SGML tools based on the popular Synex ViewPort SGML
engine. The first tool was Multidoc Pro Browser, which was released
in December 1996. Other versions soon followed.</Para>
<Para>The current tools in the Multidoc Pro product family are:
Browser, Publisher, Database Browser, Database Publisher, Translating
Editor, CD Browser and Plugin. A special internatiolization package
for Multidoc Pro has been developed to enable translation of the
program to different languages. These translations can be performed
by anyone without the need to recompile the program. Several customized
versions for different companies exist as well; for example, the Norsk
Hydro versions of the Browser and Publisher support special HyTime
contextual link (<Literal MoreInfo = "None">clink</Literal>) handling.</Para>
<Para>Multidoc Pro did not grow out of nothing. Its predecessor
was Multidoc LT, created for Wärtsilä Diesel. The LT also had
a predecessor, Eldoc. Both Eldoc and Multidoc LT are based on different technology
than Multidoc Pro and require that the SGML material be compiled
to a proprietary format before it can be used. Multidoc Pro, on
the other hand, is a native SGML product which does not require
precompilation. Multidoc Pro is being developed with Microsoft Visual
C++, while Multidoc LT was done with Asymmetrix Multimedia Toolbook.
C++ enables much finer control over the program, not to mention
the runtime speed benefit.</Para>
<Para>Some properties are common in all Multidoc Pro products. When
an SGML document is opened in Multidoc Pro, it displays a document
window and may be configured to display other windows (see <XRef
    Linkend = "CEGHDECC" xlink:type="simple" href = "#CEGHDECC"
    xlink:show = "replace" >Figure 23</XRef>, right hand
side). It is possible to define <FirstTerm>navigators</FirstTerm> for
individual documents or define navigators for a specific public
identifier. A navigator is like an electronic table of contents.
The navigators are displayed to the left of the actual document
window (see <XRef Linkend = "CEGHDECC" xlink:type="simple"
    href = "#CEGHDECC" xlink:show = "replace" >Figure
23</XRef>). It is possible to define multiple navigators for a document,
for example, a list of figures and a list of tables. The navigator
can display graphics as well as text, so a list of figures can be
very descriptive. In the navigator specification file the elements
that one wants to appear in the Multidoc Pro navigator are specified
with <Acronym>SGML</Acronym>.</Para>
<Para>Multidoc Pro formats the <Acronym>SGML</Acronym> instances
before displaying them on the screen (or sending them to the printer).
This formatting information is saved in special <FirstTerm>stylesheets</FirstTerm>,
which are <Acronym>SGML</Acronym> files themselves. A document can
have multiple stylesheets attached, although only one is active
at a time.</Para>
<Para>A document can also have multiple <FirstTerm>webs</FirstTerm> attached
to it. Webs are also SGML documents like navigators and stylesheets.
Webs enable the annotation <Acronym>SGML</Acronym> documents, but
a more exciting feature is provided as well - user defined links.
This means that a reader can insert new links between parts of the
documents in addition to those created by the original author. The
support for navigators, stylesheets and webs are all enabled by
the ViewPort SGML engine, so those constructs created with Multidoc
Pro work as well with any other ViewPort-based product and vice versa.</Para>
<Figure Float = "0"><ScreenShot><html:img src="main.gif"/></ScreenShot>
<Title Id = "CEGHDECC">Multidoc Pro Screenshot</Title></Figure>
<Para>One common use for the web files is document update. For example,
if a company issues four CDs a year but would like to notify customers
of changes between issues, web files can be created and emailed
to customers. Customers then just attach the web files to the documents
and immediately see where and what the changes are.</Para>
<Para>A construct that is not available in other ViewPort-based
products is the document set. A document set is a HyTime document,
with HyTime links pointing to the "actual" content (see <XRef
    Linkend = "CEGCGDGF" xlink:type="simple" href = "#CEGCGDGF"
    xlink:show = "replace" >Figure 24</XRef>). Multidoc
Pro offers a special document set navigator for document sets.</Para>
<Para>Multidoc Pro supports a wide range of graphics formats, starting
from commonplace <Acronym>WMF</Acronym>, <Acronym>GIF</Acronym>, <Acronym>JPEG</Acronym> and
bitmaps to somewhat more exotic <Acronym>CGM</Acronym>, <Acronym>PCX</Acronym> and <Acronym>PNG</Acronym>,
for some example. Multidoc Pro has multimedia support - it can
display multimedia files in any format provided the needed drivers
are available. Common video formats include <Acronym>MPEG</Acronym> and <Acronym>AVI</Acronym>.
Plain sound is naturally available as well. All graphics and multimedia
objects can be shown inline (multimedia objects have the accompanying
controls displayed with them, for example, the "play" button).
It is possible to specify external helper applications for non-<Acronym>SGML</Acronym> data
that Multidoc Pro can not show itself.</Para>
<Figure Float = "0"><html:img src="Gradu-24.gif"/>
<Title Id = "CEGCGDGF">Document Set</Title></Figure>
<Para>Although Multidoc Pro does not offer full <Acronym>HTML</Acronym> support,
it can still function as a web browser. Naturally, <Acronym>SGML</Acronym> files
can also be viewed over the Internet. The combination of a web browser such
as Netscape Communicator and the Multidoc Pro Plugin provides a
complete HTML and SGML Internet solution.</Para>
<Para>The Multidoc Pro programs are available for free evaluation
period of 21 days from the Citec Web service. After the evaluation
period a license key must be purchased from Citec Software Ltd. Unauthorized
usage is prevented with CrypKey<Footnote><Para>See <Literal
    MoreInfo = "None">http://www.crypkey.com</Literal> for more
information.</Para></Footnote>. Multidoc Pro requires that a special
Crypkey service is running in the computer (or in a computer connected
to the network). The service is included in the installation program</Para></Sect1>
<Sect1 Id="mpdbap39"><Title>Multidoc Pro Database Browser and Publisher</Title>
<Para>The database extensions to Multidoc Pro Browser and Publisher
makes it possible to browse relational databases as if they were <Acronym>SGML</Acronym> documents.
More importantly, it is possible to assemble large document collections
or publications from several smaller <Acronym>SGML</Acronym> files
that are managed by relational databases. Before the databases can
be used with Multidoc Pro Database Browser or Publisher, the database
structure must first be mapped into a database mapping file <Acronym>that
Multidoc Pro</Acronym> understands.</Para>
<Sect2 Id="dm40"><Title>Database Mapping</Title>
<Para>A database mapping starts by selecting <Acronym>ODBC</Acronym> data
source names to connect to. A mapping can include many data sources
simultaneously, although only one data source will be used for one dynamically
generated <Acronym>SGML</Acronym> document. For each data source
name in the configuration a mapping will be prepared. The mapping
is simply a way to tie the hardcoded database <Acronym>DTD</Acronym> elements to
a given database's structures. This is similar to the model-driven
mapping described in <XRef Linkend = "CHDDGDCF" xlink:type="simple"
    xlink:href="#CHDDGDCF"  >Section 5.1.2</XRef>.</Para>
<Para>A Data Source Name (<Acronym>DSN</Acronym>) is a concept in <Acronym>ODBC</Acronym>.
It is closely related to the <Acronym>SGML</Acronym> concept of the
Public Identifier as it is simply a symbolic name associated with
a real address. All <Acronym>DSN</Acronym>s are registered with
the <Acronym>ODBC</Acronym> driver manager. The driver manager knows
where the symbolic name really points to and can therefore act as
an invisible bridge between applications and databases.</Para>
<Para>The hardcoded database <Acronym>DTD</Acronym> (see <XRef
    Linkend = "App.xref" xlink:type="simple"
    xlink:href="#App.xref"  >Appendix
A</XRef>) in Multidoc Pro Database Browser and Publisher is very
generic (yet simple, which is why it does not follow any methodologies
for <Acronym>DTD</Acronym> structure, like the one presented in <XRef
    Linkend = "Bib-mal.xref" xlink:type="simple"
    xlink:href="#Bib-mal.xref" 
    >[Mal95]</XRef>). It has a container for a table
and a query, and that container has child elements that are used
to output the different output columns of the query (or simply table rows
if the container element is mapped to a table). The table or query
container element is called <Literal MoreInfo = "None">level</Literal>,
while the different column elements are called <Literal
    MoreInfo = "None">title</Literal>, <Literal MoreInfo = "None">dataname</Literal>, <Literal
    MoreInfo = "None">datadescription</Literal>, <Literal
    MoreInfo = "None">reference</Literal> and <Literal MoreInfo = "None">ref.name</Literal>.
The <Literal MoreInfo = "None">reference</Literal> element is a
special element because it is used to create a HyTime link to an
external <Acronym>SGML</Acronym> document.</Para>
<Para><XRef Linkend = "CEGECBAB" xlink:type="simple"
    href = "#CEGECBAB" xlink:show = "replace" >Figure
25</XRef> shows a graph view of the <Acronym>DTD</Acronym>. The <Acronym>ASCII</Acronym> representation
is in <XRef Linkend = "App.xref" xlink:type="simple"
    xlink:href="#App.xref"  >Appendix
A</XRef>. The figure is read from left to right and from top to
bottom. The symbols are the same as in a normal <Acronym>DTD</Acronym> (see <XRef
    Linkend = "BABIDIHJ" xlink:type="simple"
    xlink:href="#BABIDIHJ"  >Section
3.3.3</XRef>). The question mark (<Literal MoreInfo = "None">?</Literal>)
means an optional element. The tilde (<Literal MoreInfo = "None">~</Literal>)
means the element has attributes (attributes are not visible in
the diagram). The connector between element name boxes specifies
how the elements appear in the content model. All but <Literal
    MoreInfo = "None">datavalue</Literal> have the same connector
- this means that the element must appear in the order read from
top to bottom. For example, the element <Literal MoreInfo = "None">data</Literal> has
an optional <Literal MoreInfo = "None">dataname</Literal> followed
by zero or more <Literal MoreInfo = "None">datavalue</Literal>s.
The <Literal MoreInfo = "None">datavalue</Literal> element has the <Literal
    MoreInfo = "None">OR</Literal> connector. Its content can be
either zero or more <Literal MoreInfo = "None">datadescription</Literal> elements
or zero or more <Literal MoreInfo = "None">reference</Literal> elements.</Para>
<Figure Float = "0"><html:img src="dtd.gif"/>
<Title Id = "CEGECBAB">The Tree View of the Database DTD <XRef
    Linkend = "BABBEJJD" xlink:type="simple"
    xlink:href="#BABBEJJD"  >[CIT97a]</XRef></Title></Figure>
<Para>A mapping can be saved. The save file is in <Acronym>ASCII</Acronym> format
and, unfortunately, not too yeasy to edit. The initial idea was
to make the file binary, but <Acronym>ASCII</Acronym> was first
chosen for debugging purposes. <Acronym>SGML</Acronym> would be
ideal save format, but it has not been implemented yet. An advanced
database <Acronym>DTD</Acronym> is being developed which will allow
saving the whole mapping information in <Acronym>SGML</Acronym> format.
This <Acronym>DTD</Acronym> can be seen in <XRef
    Linkend = "Database-dtd2.xref" xlink:type="simple"
    xlink:href="#Database-dtd2.xref" 
    >Appendix B</XRef>.</Para>
<Para>The Multidoc Pro dialog where the mapping is specified is
shown in <XRef Linkend = "CEGHFFFJ" xlink:type="simple"
    href = "#CEGHFFFJ" xlink:show = "replace" >Figure
26</XRef>. Right-clicking on element names produces a context menu
that has all the functionality needed to create the mapping (see <XRef
    Linkend = "CEGGJBEB" xlink:type="simple" href = "#CEGGJBEB"
    xlink:show = "replace" >Figure 27</XRef>). The <Literal
    MoreInfo = "None">Map Database...</Literal> menu item produces
the dialogs <Literal MoreInfo = "None">Map Table</Literal>, <Literal
    MoreInfo = "None">Map Query</Literal> or <Literal MoreInfo = "None">Map
Column</Literal> depending on context. The dialogs are shown in <XRef
    Linkend = "CEGHCBBD" xlink:type="simple" href = "#CEGHCBBD"
    xlink:show = "replace" >Figure 28</XRef>, <XRef
    Linkend = "CEGEIBHD" xlink:type="simple" href = "#CEGEIBHD"
    xlink:show = "replace" >Figure 29</XRef> and <XRef
    Linkend = "CEGBIBJH" xlink:type="simple" href = "#CEGBIBJH"
    xlink:show = "replace" >Figure 30</XRef>, respectively.
The <Literal MoreInfo = "None">Map Query</Literal> dialog was added
late in the project and is very crude in design compared to the
other dialogs. <Literal MoreInfo = "None">Relationships...</Literal> menu
item opens the <Literal MoreInfo = "None">Map Relationships</Literal> dialog
(see <XRef Linkend = "CEGJEHFD" xlink:type="simple"
    href = "#CEGJEHFD" xlink:show = "replace" >Figure
31</XRef>). All the items in all the dialogs have the standard Windows
tooltips (mini-help windows that pop up when the mouse cursor hovers
over a control) associated with them. All the screenshots are from <XRef
    Linkend = "BABBEJJD" xlink:type="simple"
    xlink:href="#BABBEJJD"  >[CIT97a]</XRef>.</Para>
<Figure Float = "0"><html:img src="map.gif"/>
<Title Id = "CEGHFFFJ">Database Mapping Dialog</Title></Figure>
<Figure Float = "0"><html:img src="cmenu.gif"/>
<Title Id = "CEGGJBEB">Database Mapping Context Menu</Title></Figure>
<Figure Float = "0"><html:img src="tables.gif"/>
<Title Id = "CEGHCBBD">Map Tables Dialog</Title></Figure>
<Para>The <Literal MoreInfo = "None">Map Column</Literal> dialog
(see <XRef Linkend = "CEGBIBJH" xlink:type="simple"
    href = "#CEGBIBJH" xlink:show = "replace" >Figure
30</XRef>) allows mapping of any of the column elements. When mapping
normal elements, like <Literal MoreInfo = "None">title</Literal>,
the lower part of the dialog is disabled. The lower part is used
for <Literal MoreInfo = "None">reference</Literal> element. The
design makes it possible for the database to only holds the file
name without path or suffix information. This was crucial to Wärtsilä,
because it cannot be expected that the hundreds of subcontractors
all have the exact same directory structure. With just a small addition
to this dialog (and slightly more logic to the program, of course)
it would be possible to show the format of the file since it is
not necessary for the file to always be in SGML. What is needed
is a new edit box where the notation type of the file is specified.
This would generalize the design to work with images, for example.</Para>
<Figure Float = "0"><html:img src="query.gif"/>
<Title Id = "CEGEIBHD">Map Queries Dialog</Title></Figure>
<Figure Float = "0"><html:img src="columns.gif"/>
<Title Id = "CEGBIBJH">Map Columns Dialog</Title></Figure>
<Figure Float = "0"><html:img src="rels.gif"/>
<Title Id = "CEGJEHFD">Map Relationships Dialog</Title></Figure></Sect2>
<Sect2 Id="dg41"><Title>Document Generation</Title>
<Para>An actual <Acronym>SGML</Acronym> document is generated according
to the mappings. At first only the first level <Literal
    MoreInfo = "None">level</Literal> elements are generated. As
the user navigates the document by clicking on the navigator items
(see <XRef Linkend = "CEGHDECC" xlink:type="simple"
    href = "#CEGHDECC" xlink:show = "replace" >Figure
23</XRef>), new queries are sent to the database and the results
are inserted in <Acronym>SGML</Acronym> format into the document.
The database results are inserted into small SGML template fragments which
are then inserted into the document. This approach makes it fast
to generate and browse the instance. Generating the full document
from even a small one megabyte database would simply take too much
time. <XRef Linkend = "CEGCFDEI" xlink:type="simple"
    href = "#CEGCFDEI" xlink:show = "replace" >Figure
32</XRef> shows a sample database schema. <XRef Linkend = "CEGGDGBF"
    xlink:type="simple" href = "#CEGGDGBF" xlink:show = "replace"
    >Figure 33</XRef> shows a sample generated instance
of it. <XRef Linkend = "CHDBBJJE" xlink:type="simple"
    xlink:href="#CHDBBJJE"  >Appendix
C</XRef> lists the mapping used in its <Acronym>ASCII</Acronym> form.</Para>
<Figure Float = "0"><html:img src="schema.gif"/>
<Title Id = "CEGCFDEI">Sample Database Schema</Title></Figure>
<Figure Float = "0"><ScreenShot><html:img src="db_fig03.gif"/></ScreenShot>
<Title Id = "CEGGDGBF">Sample LSAR Generated Document <XRef
    Linkend = "BABHJHCI" xlink:type="simple"
    xlink:href="#BABHJHCI"  >[CIT98]</XRef></Title></Figure></Sect2>
<Sect2 Id="p43"><Title>Publishing</Title>
<Para>Publishing is done with the Document Set Editor. It is possible
to create the whole publication in the editor, but usually an initial
set of documents will be created in one of two ways:<ItemizedList
    Mark = "Dash"><ListItem><Para>selecting entries from the navigator
and opening them in the document set editor, or</Para></ListItem>
<ListItem><Para>querying the database for documents based on different
search criteria.</Para></ListItem></ItemizedList></Para>
<Para>Multidoc Pro Database Browser and Publisher have a query dialog.
It has a simple SQL generator, but it is also possible to write
the SQL string by hand. The query will be sent to the query executor.
The query executor will notice if the query produces columns that
are pointers to file names as specified in the database mapping.
If all of the output columns are "document columns", the query
results will be displayed in the document set editor. Otherwise,
a tabular view is shown.</Para>
<Para>After changes are made in the editor, the document can be
saved. The editor offers two choices: document set or publication.
A document set is mainly intended for electronic browsing. The document
set file itself contains links to the actual microdocuments that
form the information product. The publication file on the other
hand is a large file where all the microdocuments are included.
The publication will normally be printed on paper.</Para></Sect2></Sect1>
<Sect1 Id="mpid44"><Title>Multidoc Pro Implementation Details</Title>
<Para>The specifications for the publishing tool were written when
the standard Multidoc Pro program was beginning to take shape, but
it was not yet released. After the specifications were written,
the author was hired by Citec to develop the database and document
assembly functionality for Multidoc Pro. At that time a single programmer
had been working on the standard Multidoc Pro product for over six
months, so the basic architecture was already in place.</Para>
<Para>The programming language used to develop Multidoc Pro was
C++, or, to be exact, Microsoft Visual C++. Multidoc Pro Database
Browser and Publisher can only be compiled with version 4.1. The
mainline Multidoc Pro code is nowadays compiled with Visual C++
6.0.</Para>
<Para>Programming methodology followed mostly <XRef
    Linkend = "Bib-prosise.xref" xlink:type="simple"
    xlink:href="#Bib-prosise.xref" 
    >[Pro96]</XRef>, although more care was paid
to avoid some bad programming practices. For example, many Windows
programming guides - and indeed the Visual C++ development environment
- seem to use public data members in classes while this is recognized
as poor style (for example, <XRef Linkend = "BABGBGIB"
    xlink:type="simple" xlink:href="#BABGBGIB"
     >[Mey97]</XRef>).</Para>
<Para>Typically an MFC (the Microsoft Foundation Class class library
that ships with Visual C++) application is based on a document-view
architecture, as is Multidoc Pro. A document-view architecture means
that there is data (the "document") that can be projected or
shown in many "views". A change in the document can automatically
cause an update in all the views of that document.</Para>
<Para>Initially there was only one developer for MDP (acronym for
Multidoc Pro) and, at the peak of the project, there were six developers
working on the same code at the same time. There have been about
10 developers working on the code since the inception of the project.
Programmers were assigned to the project for a total of 12 person-years<Footnote>
<Para>The real number of man-years spent on development is significantly
less than 12 years because most programmers were also included in
other projects.</Para></Footnote>. Of this, about 1.5 years was
spent developing the database extensions, during 1996 and 1997,
but only six months of this full time.</Para>
<Para>A version control system was acquired relatively late in the
process. Before automated version control snapshots of the code
were saved manually every month or so to a Novell server where all developers
could access them. Merges were done by hand. This caused errors
and delays, so the PVCS Version Manager was finally purchased. Before
that Microsoft Visual SourceSafe was tried, but it was deemed too
slow and lacking in features.</Para>
<Para>The whole application was not created from scratch. The Synex
ViewPort engine was the core around which the whole application
was built and several other packages were acquired to speed the
development.</Para>
<Sect2 Id="sve46"><Title>Synex ViewPort Engine</Title>
<Para>Synex ViewPort is an SGML engine. It has a fast, non-validating <Acronym>SGML</Acronym> parser.
The engine is available on Windows, Unix and Macintosh platforms.
On Windows the engine is a Dynamic Load Library (<Acronym>DLL</Acronym>)
and has a C Application Programming Interface (<Acronym>API</Acronym>),
although the internals are written with C++. There are over 300 <Acronym>API</Acronym> functions.
All the ViewPort functions have a prefix <Literal MoreInfo = "None">Sv</Literal>,
for example <Literal MoreInfo = "None">SvMoveTagToNext</Literal>.
All tags, pages and other objects are represented as handles - <Literal
    MoreInfo = "None">HTAG</Literal>, <Literal MoreInfo = "None">HDOC</Literal> and <Literal
    MoreInfo = "None">HPAGE</Literal> to name a few.</Para>
<Para>This section is based mostly on <XRef Linkend = "BABJDCGC"
    xlink:type="simple" xlink:href="#BABJDCGC"
     >[Syn98]</XRef>, including the
images.</Para>
<Para><XRef Linkend = "CEGGAIDE" xlink:type="simple"
    href = "#CEGGAIDE" xlink:show = "replace" >Figure
34</XRef> shows a simplified view of how data is processed by ViewPort.
The formatter and monitor are just names for ViewPort components
and have nothing to do with the computer screen. The user application
in the project described in this thesis is Multidoc Pro.</Para>
<Para><XRef Linkend = "CEGEFAFA" xlink:type="simple"
    href = "#CEGEFAFA" xlink:show = "replace" >Figure
35</XRef> shows an overview of the system components of the ViewPort <Acronym>DLL</Acronym>.
ViewPort allows customization of most of the components via callback
functions. The entity manager was of special interest at one point
during the development of the Multidoc Pro database extensions,
because it was believed that by customizing the entity manager it
would be easy to make ViewPort process database information. Alas,
this proved to be a false assumption.</Para>
<Para>The figure shows how data flows through the various ViewPort
components before ending up on the screen. It is possible to customize
the behaviour of the components by registering callback functions.
For example, it is possible to register a callback for the entity
manager. The callback will be called for every entity (see <XRef
    Linkend = "CIHCBCCA" xlink:type="simple"
    xlink:href="#CIHCBCCA"  >Example
5</XRef> for a sample SGML instance with entities) ViewPort detects.
The callback function may then process a database query, for example,
providing the contents of the entity to ViewPort and signaling that
the entity has been handled and that no default behaviour should
happen. This approach was tried for Multidoc Pro database extensions. Unfortunately
ViewPort needs to resolve all entities when opening a document so
this will not work for large databases.</Para>
<Para>The figure clearly illustrates ViewPort processing and the
meaning of various terms used here. It should be very helpful to
the reader to study it carefully before moving forward.</Para>
<Para>ViewPort is at its best in a browser application. Limited
support for <Acronym>DTD</Acronym>, editing and validation in general
causes troubles when something beyond the functionality of a standard
browser is needed. For example, limited support - through an undocumented
function - is available for inserting new content into an already
open document, but there is no information on whether or not it
is possible to delete content.</Para>
<Figure Float = "0"><html:img src="Gradu-34.gif"/>
<Title Id = "CEGGAIDE">Data Processing in a ViewPort System</Title></Figure>
<Figure Float = "0"><html:img src="Gradu-35.gif"/>
<Title Id = "CEGEFAFA">ViewPort System Components</Title></Figure></Sect2>
<Sect2 Id="otm48"><Title>Other Third-Party Modules</Title>
<Para>Several smaller software packages were used in the Multidoc
Pro products in addition to the Synex ViewPort engine. </Para>
<Para>The first problem area that was fixed with an additional module
was a performance problem with the default tree controls. Multidoc
Pro has a special navigator that shows a document's structure as
a tree view. With a moderately-sized document opening this tree
view took about 20 minutes. The SftTree/DLL was purchased to remedy
the situation. It offers extremely fast and customizable tree controls.
The tree view that previously took dozens of minutes to open took
only a couple of seconds with SftTree/DLL! The SftTree/DLL documentation
even claims that creating a tree view with 100,000 entries on a
Pentium II 300 MHz takes only about 3 seconds <XRef Linkend = "BABGFJFB"
    xlink:type="simple" xlink:href="#BABGFJFB"
     >[Sof99]</XRef>. Interestingly
enough, SftTree/DLL is written in C, but it has C and C++ <Acronym>API</Acronym>s
into it. The two C++ frameworks directly supported are <Acronym>MFC</Acronym> and <Acronym>OWL</Acronym>.</Para>
<Para>Another performance boost came from SmartHeap which replaced
the default heap memory handling provided by the Visual C++ compiler.
SmartHeap claims to offer from 3 to 100 times faster memory allocation
than default compiler-provided allocation <XRef Linkend = "BABEIAAI"
    xlink:type="simple" xlink:href="#BABEIAAI"
     >[MQ99]</XRef>.</Para>
<Para>Dialogs in the <Acronym>MFC</Acronym> class library must be
specified with fixed-size dimensions. This is very limiting. For
example, the old file picker dialog cannot show long filenames.
This would not be a problem if the dialog could be stretched so
that the filename list box would also grow. This can be accomplished
with a neat little class library called NSViews <XRef
    Linkend = "BABDABCF" xlink:type="simple"
    xlink:href="#BABDABCF"  >[Nan97]</XRef> that
make the <Acronym>MFC</Acronym> dialogs stretchable. It is naturally
possible to specify that certain objects in a dialog cannot be moved
or stretched. NSViews is freeware.</Para>
<Para>The commercial package Objective Toolkit <XRef
    Linkend = "BABIJECB" xlink:type="simple"
    xlink:href="#BABIJECB"  >[Rog99]</XRef>,
despite its large collection of small utility classes, eventually
contributed only a small directory picker. The Objective Plug-in,
however, proved invaluable while transforming Multidoc Pro into
a Web browser plugin. The Objective Plug-in product seems to be
no longer supported.</Para>
<Para>The database extensions in Multidoc Pro were programmed with
the help of Visual SQL from Blue Sky Software<Footnote><Para>The
company homepage is <Literal MoreInfo = "None">http://www.blueskysoftware.com/</Literal>.</Para></Footnote>.
It proved to be invaluable as it it was the only tool we could find
that enabled SQL queries to work through ODBC. The enhanced database
handling classes were a bit disappointing, but some were used nevertheless.
Visual SQL does not seem to be supported anymore. This is most likely
because the new database classes in <Acronym>MFC</Acronym> offer
everything the Visual SQL classes offered and more.</Para>
<Para>Multidoc Pro can be downloaded from the web for free evaluation.
The evaluation period is 21 days. After that time the program cannot
be started because the special CrypKey mechanism prevents this.</Para>
<Para>Other small ideas, fixes and improvements too numerous to
mention were found from the various MFC, C++ and Windows programming
resources like mailing lists and web sites.</Para></Sect2>
<Sect2 Id="dsvodc49"><Title>Database Support via Open DataBase Connectivity</Title>
<Para>The Multidoc Pro Database Browser and Database Publisher were
developed for Citec Software Ltd. as a part of this thesis. These
products were a part of a larger document management system developed
by Citec Engineering Oy for Wärtsilä Diesel (later Wärtsilä
NSD). The overall project is described in <XRef Linkend = "CACDCHGA"
    xlink:type="simple" xlink:href="#CACDCHGA" 
    >Chapter 6</XRef>.</Para>
<Para>The database connections in the authoring tool and the Multidoc
Pro Database Browser and Publisher are handled through Open Database
Connectivity (<Acronym>ODBC</Acronym>) <XRef Linkend = "BGBEAJGD"
    xlink:type="simple" xlink:href="#BGBEAJGD"
     >[Mic92]</XRef>, a Microsoft standard for
connecting to relational databases. <Acronym>ODBC</Acronym> offers
a uniform interface for application developers who do not need to
worry about the actual database. Most database vendors offer an <Acronym>ODBC</Acronym> interface
to their products. There are even <Acronym>ODBC</Acronym> drivers
that enable connections over the internet.</Para>
<Para>The class diagram for the <Acronym>ODBC</Acronym> classes
is shown in <XRef Linkend = "CEGECBIJ" xlink:type="simple"
    href = "#CEGECBIJ" xlink:show = "replace" >Figure
36</XRef>. The <Literal MoreInfo = "None">CRecordset</Literal> is
the standard recordset class in the <Acronym>MFC</Acronym> library,
others were coded as part of Multidoc Pro. The <Literal
    MoreInfo = "None">CRecordset</Literal> wraps the <Acronym>SQL</Acronym> queries
in C++ objects. The <Literal MoreInfo = "None">CAbstractRecordset</Literal> offers common
functionality for the classes derived from it such as properly quoting
table and column names. <Literal MoreInfo = "None">CColumns</Literal> and <Literal
    MoreInfo = "None">CTables</Literal> were found from <Acronym>MFC</Acronym> samples
and needed very little modifications. Their function is to extract
the available columns and tables from the database, respectively.</Para>
<Figure Float = "0"><html:img src="Gradu-36.gif"/>
<Title Id = "CEGECBIJ">ODBC Recordset Classes</Title></Figure>
<Para>Because the default <Literal MoreInfo = "None">CRecordset</Literal> class
works only in the situation where the database schema is known in
advance, a special <Literal MoreInfo = "None">CDynaset</Literal> recordset
was needed for Multidoc Pro because the database schema could be
almost anything. The small utility <Literal MoreInfo = "None">CRecordCounter</Literal> was
based on an example in a newsgroup posting - its function is to
count the number of records in a recordset so that progress controls
can be used. Database connection was abstracted in the Visual SQL <Literal
    MoreInfo = "None">CVsoDatabase</Literal> class (not shown in
the figure), which inherits from the <Acronym>MFC</Acronym> <Literal
    MoreInfo = "None">CDatabase</Literal>.</Para>
<Para>Multidoc Pro generates SQL queries based on user instructions.
This generated SQL forms a very small subset of the full SQL available.
The grammar for the generated SQL is listed in <XRef
    Linkend = "CEGBGJFJ" xlink:type="simple" href = "#CEGBGJFJ"
    xlink:show = "replace" >Table 1</XRef>. Other types
of queries generated transparently by the ODBC API functions may
also occur. Queries written by the user may also differ from this
grammar.</Para>
<Table Colsep = "1" Frame = "All" Rowsep = "1" Tocentry = "1"><TGroup
    Align = "Left" Char = "" Charoff = "50" Cols = "2" Colsep = "1"
    Rowsep = "1" TGroupStyle = "Format A">
<TBody Valign = "Top">
<Row Rowsep = "1">
<Entry Colname = "1">&lt;query&gt;</Entry>
<Entry Colname = "2">SELECT DISTINCT &lt;column list&gt; FROM &lt;table
list&gt; [WHERE &lt;condition&gt; [&lt;connective&gt; &lt;condition&gt;]*]</Entry>
</Row>
<Row Rowsep = "1">
<Entry Colname = "1">&lt;table list&gt; </Entry>
<Entry Colname = "2">&lt;table name&gt;[, &lt;table name&gt;]*</Entry>
</Row>
<Row Rowsep = "1">
<Entry Colname = "1">&lt;column list&gt; </Entry>
<Entry Colname = "2">COUNT(*) | &lt;table name&gt;.&lt;column name&gt;
[,&lt;table name&gt;.&lt;column name&gt;]*</Entry>
</Row>
<Row Rowsep = "1">
<Entry Colname = "1">&lt;condition&gt; </Entry>
<Entry Colname = "2">&lt;table name&gt;.&lt;column name&gt; &lt;compare&gt;
&lt;table name&gt;.&lt;column name&gt; | &lt;field value&gt; | '&lt;field
value&gt;'</Entry>
</Row>
<Row Rowsep = "1">
<Entry Colname = "1">&lt;table name&gt; </Entry>
<Entry Colname = "2">table name | [table name]</Entry>
</Row>
<Row Rowsep = "1">
<Entry Colname = "1">&lt;column name&gt; </Entry>
<Entry Colname = "2">column name | [column name]</Entry>
</Row>
<Row Rowsep = "1">
<Entry Colname = "1">&lt;compare&gt; </Entry>
<Entry Colname = "2">= | &lt; | &gt; | &lt;= | &gt;= | &lt;&gt;</Entry>
</Row>
<Row Rowsep = "1">
<Entry Colname = "1">&lt;connective&gt; </Entry>
<Entry Colname = "2">AND | OR</Entry>
</Row>
<Row Rowsep = "0">
<Entry Colname = "1">&lt;field value&gt; </Entry>
<Entry Colname = "2">contents of a database field (a row from some
column)</Entry>
</Row>
</TBody>
</TGroup>
<Title Id = "CEGBGJFJ">Multidoc Pro Generated SQL Grammar</Title></Table></Sect2>
<Sect2 Id="mfc50"><Title>Main Functionality Classes</Title>
<Para>The majority of the database-specific code is in the class <Literal
    MoreInfo = "None">HTDBSGMLDoc</Literal>. What first started
out as an attempt to <Quote>make objects responsible for their own
user interfaces</Quote> (Allen Holub), quickly became the <FirstTerm>blob</FirstTerm> antipattern <XRef
    Linkend = "BABGEHGC" xlink:type="simple"
    xlink:href="#BABGEHGC"  >[Bro98]</XRef>.
The blob antipattern refers to a design flaw in which a huge class
is created that has most of the functionality in the system. It
should be fixed by dividing the work more evenly between different
classes.</Para>
<Para>The code uses beneficial design patterns as well. For example,
the <Literal MoreInfo = "None">HTDTD</Literal> object that contains some
knowledge about the hardcoded <Acronym>DTD</Acronym> is a <FirstTerm>singleton</FirstTerm> (<XRef
    Linkend = "Bib-dp.xref" xlink:type="simple"
    xlink:href="#Bib-dp.xref"  >[Gam94]</XRef>, <XRef
    Linkend = "BABGBGIB" xlink:type="simple"
    xlink:href="#BABGBGIB"  >[Mey97]</XRef> and
more in <XRef Linkend = "BABCFJHJ" xlink:type="simple"
    xlink:href="#BABCFJHJ"  >[Vli98]</XRef>).</Para>
<Para>The <Literal MoreInfo = "None">HTDBSGMLDoc</Literal> talks
to the database, updates the controls in all the database-specific
dialogs, builds the database connected <Acronym>SGML</Acronym> document,
creates the publications and so on. It uses some helpers to manage
all this, acting as a <FirstTerm>bridge</FirstTerm> design pattern
for some of its components. </Para>
<Para><XRef Linkend = "CEGDJFJA" xlink:type="simple"
    href = "#CEGDJFJA" xlink:show = "replace" >Figure
37</XRef> shows the classes that incorporate the main functionality
in the database extensions. <Literal MoreInfo = "None">HTCollection</Literal> is
a template for collections. <Literal MoreInfo = "None">HTDBSGMLDocCollection</Literal> and <Literal
    MoreInfo = "None">HTDTDTreeItemDataCollection</Literal> instantiate
concrete versions of the template. <Literal MoreInfo = "None">HTDTDTreeItemData</Literal> holds
the information about the database mapping nodes. This information
is used when generating the <Acronym>SGML</Acronym> documents from
databases. The black diamonds with lines indicate the "has-a"
relationship, i.e., the class with the diamond has the other class
as a data member. This is somewhat simplified Unified Modelling
Language (<Acronym>UML</Acronym>) <XRef Linkend = "BABCCFFD"
    xlink:type="simple" xlink:href="#BABCCFFD"
     >[Boo98]</XRef> notation.</Para>
<Figure Float = "0"><html:img src="Gradu-37.gif"/>
<Title Id = "CEGDJFJA">Main Database Extensions Classes</Title></Figure></Sect2>
<Sect2 Id="cm51"><Title>Code Metrics</Title>
<Para><XRef Linkend = "CEGEBIGJ" xlink:type="simple"
    href = "#CEGEBIGJ" xlink:show = "replace" >Table
2</XRef> lists some code metrics about Multidoc Pro. The full Multidoc
Pro includes code for standard <Acronym>Multidoc Pro</Acronym> and
all the variations of it, including Translating Editor and customized
browsers. Binary files (icons, bitmaps, etc.) and third party code
is excluded. Metrics were extracted with <XRef Linkend = "BABBJCHH"
    xlink:type="simple" xlink:href="#BABBJCHH"
     >[Van98]</XRef> so they are not
as accurate as they could be. Nevertheless, the figures show that
the database extensions are roughly 20% of the full Multidoc Pro
source code.</Para>
<Table Colsep = "1" Frame = "All" Rowsep = "1" Tocentry = "1"><TGroup
    Align = "Left" Char = "" Charoff = "50" Cols = "3" Colsep = "1"
    Rowsep = "1" TGroupStyle = "Format A">
<THead Valign = "Bottom">
<Row Rowsep = "1">
<Entry Colname = "1"></Entry>
<Entry Colname = "2">Full Multidoc Pro</Entry>
<Entry Colname = "3">Of Which Database Extensions<Footnote
    Id = "mdp-metrics-ft"><Para>Approximate figures, code mixed
within files of standard Multidoc Pro not counted.</Para></Footnote></Entry>
</Row>
</THead>
<TBody Valign = "Top">
<Row Rowsep = "1">
<Entry Colname = "1"><Emphasis>File Count</Emphasis></Entry>
<Entry Colname = "2">252</Entry>
<Entry Colname = "3">77</Entry>
</Row>
<Row Rowsep = "1">
<Entry Colname = "1"><Emphasis>Text Lines</Emphasis></Entry>
<Entry Colname = "2">92736</Entry>
<Entry Colname = "3">17862</Entry>
</Row>
<Row Rowsep = "1">
<Entry Colname = "1"><Emphasis>Semicolons</Emphasis></Entry>
<Entry Colname = "2">34076</Entry>
<Entry Colname = "3">7313</Entry>
</Row>
<Row Rowsep = "1">
<Entry Colname = "1"><Emphasis>Comments</Emphasis></Entry>
<Entry Colname = "2">12249</Entry>
<Entry Colname = "3">1835</Entry>
</Row>
<Row Rowsep = "1">
<Entry Colname = "1"><Emphasis>%Semicolons</Emphasis></Entry>
<Entry Colname = "2">36</Entry>
<Entry Colname = "3">40</Entry>
</Row>
<Row Rowsep = "1">
<Entry Colname = "1"><Emphasis>%Comments</Emphasis></Entry>
<Entry Colname = "2">13</Entry>
<Entry Colname = "3">10</Entry>
</Row>
<Row Rowsep = "1">
<Entry Colname = "1"><Emphasis>Classes</Emphasis></Entry>
<Entry Colname = "2">190</Entry>
<Entry Colname = "3">40</Entry>
</Row>
<Row Rowsep = "1">
<Entry Colname = "1"><Emphasis>Data Members</Emphasis></Entry>
<Entry Colname = "2">1553</Entry>
<Entry Colname = "3">337</Entry>
</Row>
<Row Rowsep = "0">
<Entry Colname = "1"><Emphasis>Member Functions</Emphasis></Entry>
<Entry Colname = "2">2625</Entry>
<Entry Colname = "3">552</Entry>
</Row>
</TBody>
</TGroup>
<Title Id = "CEGEBIGJ">Multidoc Pro Code Metrics</Title></Table></Sect2>
<Sect2 Id="t51"><Title>Testing</Title>
<Para>Testing of Multidoc Pro was difficult at times due to the
integration of the various third party components, some of which
were only in binary format. If an error was tracked down to a third party
module, it was usually necessary to submit a bug report to the manufacturer
and to pray that they would fix it in a timely fashion. If a fix
was not promised, or was taking too long, workarounds had to be
figured out. This was also a difficult task because of the black
box nature of some components.</Para>
<Para>Numega BoundsChecker<Footnote><Para>BoundsChecker is a product
of Compuware Corporation. See information about BoundsChecker from URL <Literal
    MoreInfo = "None">http://www.numega.com/products/aed/vc.shtml</Literal>.</Para></Footnote> software
was purchased mainly to track memory related bugs. BoundsChecker
required recompilation of the source to instrument the code. The
BoundsChecker compile also found some bugs. During run-time BoundsChecker
can cause a pop-up a message box to appear informing the user of
memory leak or other kinds of errors such as uninitialized variables. Errors
can be filtered and logged as well. BoundsChecker offers different
levels of instrumentation, each level catching more errors. Unfortunately,
the most rigorous level of compilation was unusable with Multidpc
Pro as it always got into an endless loop.</Para>
<Para>The database was another great source of grief, at least in
the beginning. When SQL queries were received from the database
designer and tried with the ODBC classes, they invariably failed
with Microsoft Access. The ODBC log and the error messages were
of some help in tracking down the causes of failures. The biggest
help was the Visual SQL package. Rewrite of the query with the Visual
SQL query editor/generator always gave a suitable query for the
ODBC classes.</Para>
<Para>In addition to the testing performed at development time,
Multidoc Pro Database Publisher and Browser were tested in real
use situations both at Citec and Wärtsilä. Several of these kinds
of test versions of Multidoc Pro were prepared during the development
to gather feedback and find bugs. </Para></Sect2></Sect1>
<Sect1 Id="ir53"><Title>In Retrospect</Title>
<Para>Multidoc Pro Database Browser and Publisher have showed a
way to view relational databases as <Acronym>SGML</Acronym> data.
Although the idea of a relational database keeping track of <Acronym>SGML</Acronym> information
is not new, the combination of the Author Tool and the Multidoc
Pro Database tools have made the PIM system relatively efficient
and easy to use.</Para>
<Para>The implemented system accomplished what it was supposed to
do: assemble large documents from document fragments managed by
databases. Many things can be automated, so customized information
production is easy. Needless to say, there is still room for improvement.</Para>
<Para>The implementation of Multidoc Pro Database Browser and Publisher
taught us some lessons. It was learned that ViewPort was not too
well-suited for the job. More control over the DTD and some of functionality
that is used in an editor was sorely needed. Because ViewPort was
basically intended for browsers, it does not handle insertion and
deletion of content very well.</Para>
<Para>The save file format should have been <Acronym>SGML</Acronym> in
all cases. This could actually be taken a step further by keeping
some of the information in memory resident <Acronym>SGML</Acronym> files
instead of normal C/C++ structures.</Para>
<Para>There were two major problems in the project. Probably the
bigger one was that it was not always clear what was needed and
how the full system would work and be integrated together. And even if
the intent was clear in the beginning, the objectives were redefined
during the project. As usual, more planning would have saved some
work in the later stages. All this caused unnecessary work and concentration
on things that were not important. Another problem was ViewPort
in the implementation of the database extensions. There were many
cases were ViewPort simply was not flexible enough and an alternative
had to be found. In one case, there was even an undocumented function
that would have solved a problem but it took about six months from
the initial question to the ViewPort manufacturer for them to reveal
that they indeed already had a solution!</Para>
<Para>It is not too difficult to see what changes would be beneficial
for the Multidoc Pro Database Browser and Publisher if they are
ever developed further. The mapping does not allow mapping arbitrary
structures from database. This is the most urgently needed improvement.
Another change in this direction would be to allow arbitrary database <Acronym>DTD</Acronym>s.
A better user interface would make the database mapping easier to
accomplish. Publication is, in fact, the most important feature
of the Database Publisher and warrants special attention. Speed
is also an issue that should be looked into. Some things that would
also need more work, but do not concern the program as such, are
better help and better sample databases and mappings. Finally, there
are some annoying bugs in the document assembly procedures that
sometimes cause micro-documents to fail to nest properly in the
document set or publication.</Para>
<Para>The other Multidoc Pro products have received good reviews.
The product family has grown to include translation tools and the
like, and this progress is likely to continue. However, the Authoring
Tool is as good as dead and buried, and the Multidoc Pro Database
Browser and Publisher fare not much better. The database extensions
were designed and implemented to be more generic tools, which is
why some copies of it has been sold to other customers as well.
Unfortunately, not enough to justify further development at this
time.</Para>
<Para>Although the <Acronym>SGML</Acronym> engine ViewPort has been
a good choice, it is based on old ideas. With the new HyTime standard
the whole parser should be based on the grove idea. The ViewPort-specific formatting
language must be replaced with a standard solution like <Acronym>DSSSL</Acronym>.
What all this means is that a new version of ViewPort must be based
on these standardized ideas or a completely new <Acronym>SGML</Acronym>/HyTime
engine is needed. In fact, that is where the development seems to
be going.</Para>
<Para>In March 1998 Netscape released the source to its web browser.
Citec saw this as a great opportunity, and has been working with
the code since then. Citec has already gained reputation dealing with
the huge and difficult piece of code that will be the future Netscape
Communicator 5.0. Citec's own <Acronym>SGML</Acronym>/<Acronym>XML</Acronym>-enhanced
browser DocZilla is based on the same source - with HyTime linking
support. And who knows, maybe the database support will be incorporated
into DocZilla at some point.</Para></Sect1></Chapter>
<Chapter Id = "BGBHDJHD"><Title>Summary</Title>
<BlockQuote><Para>Only fools prefer the past.</Para>
<Para><Author><Firstname>Frank</Firstname><Surname>Herbert</Surname></Author></Para></BlockQuote>
<Para>Using relational databases to manage product documentation
is not a new concept. There are technically better alternatives,
but technology alone rarely drives business. Relational databases
dominate the market and many organizations are already using relational
databases. To make them into document management system could be
as simple as adding a single table to the product information database.
This new table would holds references to product documents which
are saved on a normal file server. Of course, new relationships
must be added, but that is about all that is required of the database.</Para>
<Para>The weakness of this model is that it is very easy to break
the reference from the database to the file system - for example,
by simply renaming files on the file system.</Para>
<Para>More work is required with the tools that interact with the
product information database. There must be a special authoring
tool that makes it easy to write product documentation and tie it
in with the records in the database. Additionally, a publishing
tool is needed that can create manuals from the document fragments
managed by the database.</Para>
<Para>The most difficult challenge, however, is managing people.
After all is said and done, it is of no use if the people in question
do not accept the change or if they are not given enough training
and time to move to the new system.</Para>
<Para>Even though the project with Wärtsilä did not go as was
planned, the implementation of the system showed it is not too difficult,
technically, to implement a document management system the way it
is described in this paper. Although many publications mention this
technique of using relational databases to manage SGML documents
or document fragments, they rarely disclose the details and difficulties
involved with it. The participants in this project learned almost
everything the hard way. A second try, if there ever will be one,
should be almost a guaranteed success.</Para></Chapter>
<Bibliography Id = "Bib.xref"><Title>References</Title>
<BiblioEntry Id = "BABIFBAG" XRefLabel = "Ang97"><BookBiblio>
<AuthorGroup><Author><Surname>Angerstein</Surname><Firstname>Paula</Firstname></Author></AuthorGroup>
<Title>Why Your Document Management System Should Care About Hyperlinks</Title>
<Url xlink:type="simple" xlink:href="http://www.texcel.no/se97talk.htm" Hytime = "CLINK"
    Hynames = "linkend href" Loctype = "href queryloc URL"></Url>
<Publisher><PublisherName>Texcel Research, Inc.</PublisherName></Publisher>
<PubDate>1997</PubDate></BookBiblio></BiblioEntry>
<BiblioEntry Id = "Bib-arborwp.xref" XRefLabel = "Arb95"><BookBiblio>
<AuthorGroup><CorpAuthor>ArborText, Inc.</CorpAuthor></AuthorGroup>
<Title>Getting Started with SGML</Title><Subtitle>A Guide to the
Standard Generalized Markup Language and Its Role in Information
Management</Subtitle><Url xlink:type="simple" xlink:href="http://www.arbortext.com/wp.html"
    Hytime = "CLINK" Hynames = "linkend href"
    Loctype = "href queryloc URL"></Url><PubDate>1995</PubDate></BookBiblio></BiblioEntry>
<BiblioEntry Id = "BABJBIAH" XRefLabel = "Bal97"><BookBiblio>
<AuthorGroup><Author><Surname>Balasubramanian</Surname><Firstname>V.</Firstname></Author>
<Author><Surname>Bashian</Surname><Firstname>Alf</Firstname></Author>
<Author><Surname>Porcher</Surname><Firstname>Daniel</Firstname></Author></AuthorGroup>
<Title>A Large-Scale Hypermedia Application Using Document Management
And Web Technologies</Title><PubsNumber>in "HYPERTEXT '97",
Proceedings of the Eight ACM Concerence on Hypertext</PubsNumber>
<Publisher><PublisherName>pages 134-145</PublisherName></Publisher>
<PubDate>1997</PubDate></BookBiblio></BiblioEntry>
<BiblioEntry Id = "BABCCFFD" XRefLabel = "Boo98"><BookBiblio>
<AuthorGroup><Author><Surname>Booch</Surname><Firstname>Grady</Firstname></Author>
<Author><Surname>Jacobson</Surname><Firstname>Ivar</Firstname></Author>
<Author><Surname>Rumbaugh</Surname><Firstname>James</Firstname></Author></AuthorGroup>
<Title>The Unified Modeling Language User Guide</Title><Publisher>
<PublisherName>Addison-Wesley Publishing Company</PublisherName></Publisher>
<PubDate>1998</PubDate></BookBiblio></BiblioEntry>
<BiblioEntry Id = "BABEADBE" XRefLabel = "Bou99"><BookBiblio>
<AuthorGroup><Author><Surname>Bourret</Surname><Firstname>Ronald</Firstname></Author></AuthorGroup>
<Title>XML and Databases</Title><Url
    xlink:type="simple" xlink:href="http://www.informatik.tu-darmstadt.de/DVS1/staff/bourret/xml/XMLAndDatabases.htm"
    Hytime = "CLINK" Hynames = "linkend href"
    Loctype = "href queryloc URL"></Url><Publisher><PublisherName>Technical
University of Darmstadt</PublisherName></Publisher><PubDate>1999</PubDate></BookBiblio></BiblioEntry>
<BiblioEntry Id = "BABGEHGC" XRefLabel = "Bro98"><BookBiblio>
<AuthorGroup><Author><Surname>Brown</Surname><Firstname>William</Firstname>
<OtherName>J.</OtherName></Author><Author><Surname>Malveau</Surname>
<Firstname>Raphael</Firstname><OtherName>C.</OtherName></Author>
<Author><Surname>Brown</Surname><Firstname>William</Firstname>
<OtherName>H.</OtherName></Author><Author><Surname>McCormick</Surname>
<Honorific>III</Honorific><Firstname>Hays</Firstname><OtherName>W.</OtherName></Author></AuthorGroup>
<Title>Antipatterns</Title><Subtitle>Refactoring Software, Architecture
and Projects in Crisis</Subtitle><Publisher><PublisherName>John Wiley
&amp; Sons</PublisherName></Publisher><PubDate>1998</PubDate></BookBiblio></BiblioEntry>
<BiblioEntry Id = "BABDCADA" XRefLabel = "Bus45"><BookBiblio>
<AuthorGroup><Author><Surname>Bush</Surname><Firstname>Vannevar</Firstname></Author></AuthorGroup>
<Title>As We May Think</Title><VolumeNum>The Atlantic Monthly</VolumeNum>
<IssueNum>July (1945)</IssueNum><PageNums>pages 641-649</PageNums></BookBiblio></BiblioEntry>
<BiblioEntry Id = "BABIBHHA" XRefLabel = "Böh94"><BookBiblio>
<AuthorGroup><Author><Surname>Böhm</Surname><Firstname>Klemens</Firstname></Author>
<Author><Surname>Aberer</Surname><Firstname>Karl</Firstname></Author></AuthorGroup>
<Title>Storing HyTime Documents In an Object-Oriented Database</Title>
<PubsNumber>in: "CIKM '94", Proceedings of the Third International
Conference on Information and Knowledge Management</PubsNumber>
<Publisher><PublisherName>pages 26-33</PublisherName></Publisher>
<PubDate>1994</PubDate></BookBiblio></BiblioEntry>
<BiblioEntry Id = "BABIGIDJ" XRefLabel = "CIM98"><BookBiblio>
<AuthorGroup><CorpAuthor>CIMdata</CorpAuthor></AuthorGroup><Title>Product
Data Management: The Definition</Title><Subtitle>An introduction
to Concepts, Benefits, and Terminology</Subtitle><Url
    xlink:type="simple" xlink:href="http://www.cimdata.com" Hytime = "CLINK"
    Hynames = "linkend href" Loctype = "href queryloc URL"></Url>
<PubDate>1988</PubDate></BookBiblio></BiblioEntry>
<BiblioEntry Id = "BABBEJJD" XRefLabel = "CIT97a"><BookBiblio>
<AuthorGroup><CorpAuthor>CITEC Engineering Oy</CorpAuthor></AuthorGroup>
<Title>Multidoc Pro Database Browser and Database Publisher -
User's Manual</Title><PubDate>1997</PubDate></BookBiblio></BiblioEntry>
<BiblioEntry Id = "BABHEEAA" XRefLabel = "CIT97b"><BookBiblio>
<AuthorGroup><CorpAuthor>CITEC Engineering Oy</CorpAuthor></AuthorGroup>
<Title>WNS Author Tool for the Base-DTD User Manual</Title><PubDate>1997</PubDate></BookBiblio></BiblioEntry>
<BiblioEntry Id = "BABHJHCI" XRefLabel = "CIT98"><BookBiblio>
<AuthorGroup><CorpAuthor>CITEC Engineering Oy</CorpAuthor></AuthorGroup>
<Title>Multidoc Pro Browser/Publisher Product Brief</Title><PubDate>1998</PubDate></BookBiblio></BiblioEntry>
<BiblioEntry Id = "BABJBFBG" XRefLabel = "DeR94"><BookBiblio>
<AuthorGroup><Author><Surname>DeRose</Surname><Firstname>Steven</Firstname>
<OtherName>J.</OtherName></Author><Author><Surname>Durand</Surname>
<Firstname>David</Firstname><OtherName>G.</OtherName></Author></AuthorGroup>
<Title>Making Hypermedia Work</Title><Subtitle>A User's Guide to
HyTime</Subtitle><Publisher><PublisherName>Kluwer Academic Publishers</PublisherName></Publisher>
<PubDate>1994</PubDate></BookBiblio></BiblioEntry>
<BiblioEntry Id = "BABJIHID" XRefLabel = "Eck95"><BookBiblio>
<AuthorGroup><Author><Surname>Eckel</Surname><Firstname>Bruce</Firstname></Author></AuthorGroup>
<Title>Thinking in C++</Title><Publisher><PublisherName>Prentice
Hall, Inc.</PublisherName></Publisher><PubDate>1995</PubDate></BookBiblio></BiblioEntry>
<BiblioEntry Id = "Bib-elo95.xref" XRefLabel = "Elo95"><BookBiblio>
<AuthorGroup><Author><Surname>Elovainio</Surname><Firstname>Kimmo</Firstname></Author></AuthorGroup>
<Title>SGML-Based Documentation Process</Title><Publisher>
<PublisherName>VTT OFFSETPAINO</PublisherName></Publisher><PubDate>1995</PubDate></BookBiblio></BiblioEntry>
<BiblioEntry Id = "BABCAHEE" XRefLabel = "FMV95"><BookBiblio>
<AuthorGroup><CorpAuthor>FMV</CorpAuthor></AuthorGroup><Title>Description
of FMV Grund-DTD</Title><Url
    xlink:type="simple" xlink:href="http://info.admin.kth.se/SGML/Bibliotek/DTDer/FMVGrund-DTD/"
    Hytime = "CLINK" Hynames = "linkend href"
    Loctype = "href queryloc URL"></Url><PubDate>1995</PubDate></BookBiblio></BiblioEntry>
<BiblioEntry Id = "Bib-dp.xref" XRefLabel = "Gam94"><BookBiblio>
<AuthorGroup><Author><Surname>Gamma</Surname><Firstname>Erich</Firstname></Author>
<Author><Surname>Helm</Surname><Firstname>Richard</Firstname></Author>
<Author><Surname>Johnson</Surname><Firstname>Ralph</Firstname></Author>
<Author><Surname>Vlissides</Surname><Firstname>John</Firstname></Author></AuthorGroup>
<Title>Design Patterns</Title><Subtitle>Elements of Reusable Object-Oriented
Software</Subtitle><Publisher><PublisherName>Addison-Wesley</PublisherName></Publisher>
<PubDate>1994</PubDate></BookBiblio></BiblioEntry>
<BiblioEntry Id = "BABJBCHE" XRefLabel = "Gol90"><BookBiblio>
<AuthorGroup><Author><Surname>Goldfarb</Surname><Firstname>Charles</Firstname>
<OtherName>F.</OtherName></Author></AuthorGroup><Title>The SGML
Handbook</Title><Publisher><PublisherName>Oxford University Press
Inc.</PublisherName></Publisher><PubDate>1990</PubDate></BookBiblio></BiblioEntry>
<BiblioEntry Id = "BABCCDJH" XRefLabel = "Hof99"><BookBiblio>
<AuthorGroup><Author><Surname>Hoffman</Surname><Firstname>James</Firstname></Author></AuthorGroup>
<Title>Introduction to Structured Query Language</Title><Url
    xlink:type="simple" xlink:href="http://w3.one.net/~jhoffman/sqltut.htm" Hytime = "CLINK"
    Hynames = "linkend href" Loctype = "href queryloc URL"></Url>
<PubDate>1999</PubDate></BookBiblio></BiblioEntry>
<BiblioEntry Id = "BGBEIDAJ" XRefLabel = "ISO86"><BookBiblio>
<AuthorGroup><CorpAuthor>ISO 8879:1986</CorpAuthor></AuthorGroup>
<Title>Information Processing - Text and Office Systems - Standard Generalized
Markup Language (SGML)</Title><PubDate>1986</PubDate></BookBiblio></BiblioEntry>
<BiblioEntry Id = "BGBIDCCE" XRefLabel = "ISO89"><BookBiblio
    Id = "BGBEIIEJ"><AuthorGroup><CorpAuthor>ISO 8613</CorpAuthor></AuthorGroup>
<Title>Information Technology - Text and Office Systems - Office
Document Architecture (ODA)</Title><PubDate>1989</PubDate></BookBiblio></BiblioEntry>
<BiblioEntry Id = "BGBJDDAG" XRefLabel = "ISO92a"><BookBiblio>
<AuthorGroup><CorpAuthor>ISO/IEC 9075:1992</CorpAuthor></AuthorGroup>
<Title>Information Technology - Database Languages - SQL</Title>
<PubDate>1992</PubDate></BookBiblio></BiblioEntry>
<BiblioEntry Id = "BABIDEBC" XRefLabel = "ISO92b"><BookBiblio>
<AuthorGroup><CorpAuthor>ISO/IEC 8632:1992</CorpAuthor></AuthorGroup>
<Title>Information Processing Systems - Computer Graphics Metafile
for the Storage and Transfer of Picture Description Information (CGM)</Title>
<PubDate>1992</PubDate></BookBiblio></BiblioEntry>
<BiblioEntry Id = "BABCIFCA" XRefLabel = "ISO93"><BookBiblio>
<AuthorGroup><CorpAuthor>ISO/IEC 10646-1:1993</CorpAuthor></AuthorGroup>
<Title>Information technology - Universal Multiple-Octet Coded
Character Set (UCS) - Part 1: Architecture and Basic Multilingual Plane</Title>
<PubDate>1993</PubDate></BookBiblio></BiblioEntry>
<BiblioEntry Id = "BABDDCCJ" XRefLabel = "ISO94"><BookBiblio>
<AuthorGroup><CorpAuthor>ISO 10303</CorpAuthor></AuthorGroup><Title>Industrial
Automation Systems and Integration - Product Data Representation
and Exchange (STEP)</Title><PubDate>1994-1998</PubDate></BookBiblio></BiblioEntry>
<BiblioEntry Id = "BGBIABBH" XRefLabel = "ISO96"><BookBiblio>
<AuthorGroup><CorpAuthor>ISO/IEC 10179:1996</CorpAuthor></AuthorGroup>
<Title>Information Technology - Text and Office Systems - Document
Style Semantics and Specification Language (DSSSL)</Title><PubDate>1996</PubDate></BookBiblio></BiblioEntry>
<BiblioEntry Id = "BGBGFAHG" XRefLabel = "ISO97"><BookBiblio>
<AuthorGroup><CorpAuthor>ISO/IEC 10744:1997</CorpAuthor></AuthorGroup>
<Title>Information Technology - Hypermedia/Time-based Structuring
Language (HyTime)</Title><PubDate>1997</PubDate></BookBiblio></BiblioEntry>
<BiblioEntry Id = "BGBGJHCD" XRefLabel = "ISO98a"><BookBiblio>
<AuthorGroup><CorpAuthor>ISO/IEC 14772-1:1998</CorpAuthor></AuthorGroup>
<Title>Information technology - Computer graphics and image processing
- The Virtual Reality Modeling Language - Part 1: Functional specification
and UTF-8 encoding (VRML)</Title><PubDate>1998</PubDate></BookBiblio></BiblioEntry>
<BeginPage/>
<BiblioEntry Id = "BABFCCEF" XRefLabel = "ISO98b"><BookBiblio>
<AuthorGroup><CorpAuthor>ISO/IEC 16262:1998</CorpAuthor></AuthorGroup>
<Title>Information Technology - ECMAScript Language Specification</Title>
<PubDate>1998</PubDate></BookBiblio></BiblioEntry>
<BiblioEntry Id = "BABFEEBG" XRefLabel = "Kim97"><BookBiblio>
<AuthorGroup><Author><Surname>Kimber</Surname><Firstname>W.</Firstname>
<OtherName>Eliot</OtherName></Author></AuthorGroup><Title>A Tutorial
Introduction to SGML Architectures</Title><Url
    xlink:type="simple" xlink:href="http://www.isogen.com/papers/archintro.html"
    Hytime = "CLINK" Hynames = "linkend href"
    Loctype = "href queryloc URL"></Url><Publisher><PublisherName>ISOGEN
International Corp.</PublisherName></Publisher><PubDate>1997</PubDate></BookBiblio></BiblioEntry>
<BiblioEntry Id = "BABIEGCG" XRefLabel = "Kim98"><BookBiblio>
<AuthorGroup><Author><Surname>Kimber</Surname><Firstname>W.</Firstname>
<OtherName>Eliot</OtherName></Author></AuthorGroup><Title>Practical
Hypermedia</Title><Subtitle>An Introduction to HyTime</Subtitle>
<Publisher><PublisherName>Prentice Hall</PublisherName></Publisher>
<PubDate>1998</PubDate></BookBiblio></BiblioEntry>
<BiblioEntry Id = "BABJHHIG" XRefLabel = "Kla98"><BookBiblio>
<AuthorGroup><Author><Surname>Klavans</Surname><Firstname>Judith</Firstname></Author></AuthorGroup>
<Title>Data Bases in Digital Libraries</Title><Subtitle>Where Computer
Science and Information Management Meet</Subtitle><PubsNumber>in
"PODS '98", proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART
symposium on Principles of database systems</PubsNumber><Publisher>
<PublisherName>pages 224-226</PublisherName></Publisher><PubDate>1998</PubDate></BookBiblio></BiblioEntry>
<BiblioEntry Id = "BGBEHIJF" XRefLabel = "Lam94"><BookBiblio>
<AuthorGroup><Author><Surname>Lamport</Surname><Firstname>Leslie</Firstname></Author></AuthorGroup>
<Title>LaTeX: A Document Preparation System</Title><Publisher>
<PublisherName>Addison-Wesley Publishing Company, Inc</PublisherName></Publisher>
<PubDate>1994</PubDate></BookBiblio></BiblioEntry>
<BiblioEntry Id = "Bib-lin.xref" XRefLabel = "Lin97"><BookBiblio>
<AuthorGroup><Author><Surname>Lindén</Surname><Firstname>Greger</Firstname></Author></AuthorGroup>
<Title>Structured Document Transformations</Title><PubsNumber>PhD
Thesis, Series of Publications A</PubsNumber><Publisher><PublisherName>University
of Helsinki</PublisherName></Publisher><PubDate>Report A-1997-2
(1997)</PubDate></BookBiblio></BiblioEntry>
<BiblioEntry Id = "BABIEGHJ" XRefLabel = "Loi99"><BookBiblio>
<AuthorGroup><Author><Surname>Loizou</Surname><Firstname>George</Firstname></Author>
<Author><Surname>Levene</Surname><Firstname>Mark</Firstname></Author></AuthorGroup>
<Title>A Guided Tour of Relational Databases and Beyond</Title>
<Publisher><PublisherName>Springer Verlag</PublisherName></Publisher>
<PubDate>1999</PubDate></BookBiblio></BiblioEntry>
<BiblioEntry Id = "BABFCHCD" XRefLabel = "Loo98"><BookBiblio>
<AuthorGroup><Author><Surname>Loomis</Surname><Firstname>Mary</Firstname></Author>
<Author><Surname>Chaudri</Surname><Firstname>Akmal</Firstname>
<OtherName>B.</OtherName></Author></AuthorGroup><Title>Object Databases
in Practice</Title><Publisher><PublisherName>Prentice-Hall, Inc</PublisherName></Publisher>
<PubDate>1998</PubDate></BookBiblio></BiblioEntry>
<BiblioEntry Id = "Bib-mal.xref" XRefLabel = "Mal95"><BookBiblio>
<AuthorGroup><Author><Surname>Maler</Surname><Firstname>Eve</Firstname></Author>
<Author><Surname>Andaloussi</Surname><Firstname>Jeanne</Firstname>
<OtherName>El</OtherName></Author></AuthorGroup><Title>Developing
SGML DTDs</Title><Subtitle>From Text to Model to Markup</Subtitle>
<Publisher><PublisherName>Prentice Hall</PublisherName></Publisher>
<PubDate>1995</PubDate></BookBiblio></BiblioEntry>
<BiblioEntry Id = "BABIAGFB" XRefLabel = "Met99"><BookBiblio>
<AuthorGroup><Author><Surname>Metsäranta</Surname><Firstname>Pekka</Firstname></Author></AuthorGroup>
<Title>"Rakenteisen tiedon säilyttäminen</Title><Subtitle>XML-dokumentti
OAIS-viitemallissa (in Finnish)"</Subtitle><PubsNumber>Master
of Science Thesis, Jyväskylä University</PubsNumber><Url
    xlink:type="simple" xlink:href="http://www.syspro.fi/pekka.metsaranta/gradu/"
    Hytime = "CLINK" Hynames = "linkend href"
    Loctype = "href queryloc URL"></Url><PubDate>1999</PubDate></BookBiblio></BiblioEntry>
<BiblioEntry Id = "BABGBGIB" XRefLabel = "Mey97"><BookBiblio>
<AuthorGroup><Author><Surname>Meyers</Surname><Firstname>Scott</Firstname></Author></AuthorGroup>
<Title>Effective C++</Title><Subtitle>50 Ways to Improve Your Programs
and Designs</Subtitle><Publisher><PublisherName> Addison-Wesley
Publishing Company</PublisherName></Publisher><PubDate>1997</PubDate></BookBiblio></BiblioEntry>
<BiblioEntry Id = "BGBEAJGD" XRefLabel = "Mic92"><BookBiblio>
<AuthorGroup><CorpAuthor>Microsoft Corporation</CorpAuthor></AuthorGroup>
<Title>ODBC Application Programmer's Guide</Title><Publisher>
<PublisherName>Microsoft</PublisherName></Publisher><PubDate>1992</PubDate></BookBiblio></BiblioEntry>
<BiblioEntry Id = "BABEIAAI" XRefLabel = "MQ99"><BookBiblio>
<AuthorGroup><CorpAuthor>MicroQuill</CorpAuthor></AuthorGroup><Title>SmartHeap</Title>
<Url xlink:type="simple" xlink:href="http://www.microquill.com/prod_sh/index_sh.htm"
    Hytime = "CLINK" Hynames = "linkend href"
    Loctype = "href queryloc URL"></Url><PubDate>1999</PubDate></BookBiblio></BiblioEntry>
<BiblioEntry Id = "BABHAAGE" XRefLabel = "Mya98"><BookBiblio>
<AuthorGroup><Author><Surname>Myaeng</Surname><Firstname>Sung</Firstname>
<OtherName>Hyon</OtherName></Author><Author><Surname>Jang</Surname>
<Firstname>Don-Hyun</Firstname></Author><Author><Surname>Kim</Surname>
<Firstname>Mun-Seok</Firstname></Author><Author><Surname>Zhoo</Surname>
<Firstname>Zong-Cheol</Firstname></Author></AuthorGroup><Title>A
Flexible Model for Retrieval of SGML Documents</Title><PubsNumber>in
"SIGIR '98", Proceedings of the 21st Annual International
ACM SIGIR Conference on Research and Development in Information
Retrieval</PubsNumber><Publisher><PublisherName>pages 138-145</PublisherName></Publisher>
<PubDate>1998</PubDate></BookBiblio></BiblioEntry>
<BiblioEntry Id = "BABDABCF" XRefLabel = "Nan97"><BookBiblio>
<AuthorGroup><CorpAuthor>NanoSoft Corporation</CorpAuthor></AuthorGroup>
<Title>NSViews Version 1.04</Title><Url
    xlink:type="simple" xlink:href="http://www.nanocorp.com/nsviews/default.htm"
    Hytime = "CLINK" Hynames = "linkend href"
    Loctype = "href queryloc URL"></Url><PubDate>1997</PubDate></BookBiblio></BiblioEntry>
<BiblioEntry Id = "BABECJEB" XRefLabel = "Nel82"><BookBiblio>
<AuthorGroup><Author><Surname>Nelson</Surname><Firstname>Theodore</Firstname>
<OtherName>Holm</OtherName></Author></AuthorGroup><Title>Literary
Machines</Title><Publisher><PublisherName>Mindful Press</PublisherName></Publisher>
<PubDate>1982</PubDate></BookBiblio></BiblioEntry>
<BiblioEntry Id = "BABBGCJI" XRefLabel = "Nel97"><BookBiblio>
<AuthorGroup><Author><Surname>Nelson</Surname><Firstname>Theodore</Firstname>
<OtherName>Holm</OtherName></Author></AuthorGroup><Title>Embedded
Markup Considered Harmful</Title><Url xlink:type="simple" xlink:href="http://www.xml.com"
    Hytime = "CLINK" Hynames = "linkend href"
    Loctype = "href queryloc URL"></Url><PubDate>1997</PubDate></BookBiblio></BiblioEntry>
<BiblioEntry Id = "BABGAEIG" XRefLabel = "New91"><BookBiblio>
<AuthorGroup><Author><Surname>Newcomb</Surname><Firstname>Steven</Firstname>
<OtherName>R.</OtherName></Author><Author><Surname>Kipp</Surname>
<Firstname>Neill</Firstname><OtherName>A.</OtherName></Author><Author>
<Surname>Newcomb</Surname><Firstname>Victoria</Firstname><OtherName>T.</OtherName></Author></AuthorGroup>
<Title>"HyTime"</Title><Subtitle>The Hypermedia/Time-based Document
Structuring Language</Subtitle><PubsNumber>Communications of the ACM</PubsNumber>
<PubDate>Vol. 43, No. II (1991)</PubDate></BookBiblio></BiblioEntry>
<BiblioEntry Id = "BABJHEIE" XRefLabel = "OAS99"><BookBiblio>
<AuthorGroup><CorpAuthor>OASIS</CorpAuthor></AuthorGroup><Title>The
DocBook DTD</Title><Url xlink:type="simple" xlink:href="http://www.oasis-open.org/docbook/"
    Hytime = "CLINK" Hynames = "linkend href"
    Loctype = "href queryloc URL"></Url><PubDate>1999</PubDate></BookBiblio></BiblioEntry>
<BiblioEntry Id = "BABFGIFE" XRefLabel = "Onn99"><BookBiblio
    Id = "BABGDCCE"><AuthorGroup><Author><Surname>Onnela</Surname>
<Firstname>Tapio</Firstname></Author></AuthorGroup><Title>Bittiarkisto
voi jäädä lukematta (in Finnish)</Title><VolumeNum>Tiede 2000</VolumeNum>
<IssueNum>5 (1999)</IssueNum><PageNums>p. 37</PageNums></BookBiblio></BiblioEntry>
<BiblioEntry Id = "BABGEADG" XRefLabel = "Paq92"><BookBiblio>
<AuthorGroup><Author><Surname>Paquet</Surname><Firstname>Gaël</Firstname></Author></AuthorGroup>
<Title>Hyper9002: An Online Operating Manual for a Chemical Manufacturer
Using Hypertext Integrated with an Object Oriented Database</Title>
<PubsNumber>in: "SAC '92", Proceedings of the 1992 ACM/SIGAPP
Symposium on Applied Computing (vol. II): Technolological Challenges
of the 1990's</PubsNumber><Publisher><PublisherName>pages 976-984</PublisherName></Publisher>
<PubDate>1992</PubDate></BookBiblio></BiblioEntry>
<BiblioEntry Id = "BABDBDBA" XRefLabel = "PDM97a"><BookBiblio>
<AuthorGroup><CorpAuthor>The PDM Information Center, courtesy of
Hewlett-Packard</CorpAuthor></AuthorGroup><Title>Understanding Product Data
Management</Title><Url xlink:type="simple" xlink:href="http://www.pdmic.com/undrstnd.html"
    Hytime = "CLINK" Hynames = "linkend href"
    Loctype = "href queryloc URL"></Url><PubDate>1997</PubDate></BookBiblio></BiblioEntry>
<BiblioEntry Id = "BABBEGEE" XRefLabel = "PDM97b"><BookBiblio>
<AuthorGroup><CorpAuthor>The PDM Information Center</CorpAuthor></AuthorGroup>
<Title>How the technology Has Evolved</Title><Subtitle>A Short Review</Subtitle>
<Url xlink:type="simple" xlink:href="http://www.pdmic.com/evoltech.html" Hytime = "CLINK"
    Hynames = "linkend href" Loctype = "href queryloc URL"></Url>
<PubDate>1997</PubDate></BookBiblio></BiblioEntry>
<BiblioEntry Id = "BGBGCJCF" XRefLabel = "Pel97a"><BookBiblio>
<AuthorGroup><Author><Surname>Peltonen</Surname><Firstname>Björn</Firstname></Author>
<Author><Surname>Mäki</Surname><Firstname>Erik</Firstname></Author></AuthorGroup>
<Title>Case Study: Wärtsilä Diesel Oy, Power Plants</Title><Url
    xlink:type="simple" xlink:href="http://www.citec.fi/company/services/case/wd_pp.html"
    Hytime = "CLINK" Hynames = "linkend href"
    Loctype = "href queryloc URL"></Url><PubDate>1997</PubDate></BookBiblio></BiblioEntry>
<BiblioEntry Id = "Bib-ryt97.xref" XRefLabel = "Pel97b"><BookBiblio>
<AuthorGroup><Author><Surname>Peltonen</Surname><Firstname>Björn</Firstname></Author></AuthorGroup>
<Title>"Case Study", The SGML (<Emphasis>Standard Generalized
Markup Language</Emphasis>) Implementation at Norsk Hydro</Title>
<Subtitle>Do More with Less and Do It Better</Subtitle><PubsNumber>in: "SGML
Finland 1997 - seminaarijulkaisu", Proceedings of Finnish SGML
Conference</PubsNumber><Publisher><PublisherName>SGML User's Group
Finland</PublisherName></Publisher><Publisher><PublisherName>pages
4-9</PublisherName></Publisher><PubDate>1997</PubDate></BookBiblio></BiblioEntry>
<BiblioEntry Id = "BABEHGFG" XRefLabel = "Pre98"><BookBiblio>
<AuthorGroup><Author><Surname>Prescod</Surname><Firstname>Paul</Firstname></Author></AuthorGroup>
<Title>Formalizing SGML and XML Instances and Schemata with Forest Automata
Theory</Title><Url xlink:type="simple" xlink:href="http://www.prescod.net/forest/shorttut/"
    Hytime = "CLINK" Hynames = "linkend href"
    Loctype = "href queryloc URL"></Url><PubDate>1998</PubDate></BookBiblio></BiblioEntry>
<BiblioEntry Id = "Bib-prosise.xref" XRefLabel = "Pro96"><BookBiblio>
<AuthorGroup><Author><Surname>Prosise</Surname><Firstname>Jeff</Firstname></Author></AuthorGroup>
<Title>Programming Windows 95 with MFC</Title><Subtitle>Create Programs
for Windows Quickly with the Microsoft Foundation Class Library</Subtitle>
<Publisher><PublisherName>Microsoft Press</PublisherName></Publisher>
<PubDate>1996</PubDate></BookBiblio></BiblioEntry>
<BeginPage/>
<BiblioEntry Id = "BABGCJCI" XRefLabel = "Rei98"><BookBiblio>
<AuthorGroup><Author><Surname>Reinwald</Surname><Firstname>Berthold</Firstname></Author>
<Author><Surname>Pirahesh</Surname><Firstname>Hamid</Firstname></Author></AuthorGroup>
<Title>SQL Open Heterogenous Data Access</Title><PubsNumber>in "SIGMOD
'98", Proceedings of ACM SIGMOD International Conference on
Management of Data</PubsNumber><Publisher><PublisherName>pages 506-507</PublisherName></Publisher>
<Publisher><PublisherName>ACM</PublisherName></Publisher><PubDate>1998</PubDate></BookBiblio></BiblioEntry>
<BiblioEntry Id = "BABIJECB" XRefLabel = "Rog99"><BookBiblio>
<AuthorGroup><CorpAuthor>RogueWave Software</CorpAuthor></AuthorGroup>
<Title>Objective Toolkit</Title><Url
    xlink:type="simple" xlink:href="http://www.roguewave.com/products/ot/" Hytime = "CLINK"
    Hynames = "linkend href" Loctype = "href queryloc URL"></Url>
<PubDate>1999</PubDate></BookBiblio></BiblioEntry>
<BiblioEntry Id = "BGBHGHBH" XRefLabel = "Ryt97"><BookBiblio>
<AuthorGroup><Author><Surname>Rytkönen</Surname><Firstname>Kimmo</Firstname></Author>
<Author><Surname>Kunz</Surname><Firstname>Jürgen</Firstname></Author></AuthorGroup>
<Title>DOCSTEP - Technical Documentation Creation and Management
using STEP</Title><PubsNumber>in: "SGML Finland 1997 - seminaarijulkaisu", Proceedings
of Finnish SGML Conference</PubsNumber><Publisher><PublisherName>SGML
Finland User's Group</PublisherName></Publisher><Publisher>
<PublisherName>pages 39-68</PublisherName></Publisher><PubDate>1997</PubDate></BookBiblio></BiblioEntry>
<BiblioEntry Id = "BABIAEHI" XRefLabel = "Sip96"><BookBiblio>
<AuthorGroup><Author><Surname>Sipser</Surname><Firstname>Michael</Firstname></Author></AuthorGroup>
<Title>Introduction to the Theory of Computation</Title><Publisher>
<PublisherName>International Thomson Publishing</PublisherName></Publisher>
<PubDate>1996</PubDate></BookBiblio></BiblioEntry>
<BiblioEntry Id = "BABGFJFB" XRefLabel = "Sof99"><BookBiblio>
<AuthorGroup><CorpAuthor>Softel vdm Inc.</CorpAuthor></AuthorGroup>
<Title>SftTree/DLL 4.0 Product Information</Title><Url
    xlink:type="simple" xlink:href="http://www.softelvdm.com/sfttree.html" Hytime = "CLINK"
    Hynames = "linkend href" Loctype = "href queryloc URL"></Url>
<PubDate>1999</PubDate></BookBiblio></BiblioEntry>
<BiblioEntry Id = "BGBIHBEG" XRefLabel = "Som96"><BookBiblio>
<AuthorGroup><Author><Surname>Sommerville</Surname><Firstname>Ian</Firstname></Author></AuthorGroup>
<Title>Software Engineering</Title><Publisher><PublisherName>Addison-Wesley
Publishers Ltd.</PublisherName></Publisher><PubDate>1996</PubDate></BookBiblio></BiblioEntry>
<BiblioEntry Id = "BABIDJEF" XRefLabel = "Sun81"><BookBiblio>
<AuthorGroup><Author><Surname>Sundgren</Surname><Firstname>Bo</Firstname></Author></AuthorGroup>
<Title>Databaser och datamodeller (in Swedish)</Title><Publisher>
<PublisherName>Studentlitteratur</PublisherName></Publisher><PubDate>1981</PubDate></BookBiblio></BiblioEntry>
<BiblioEntry Id = "BABJDCGC" XRefLabel = "Syn98"><BookBiblio>
<AuthorGroup><CorpAuthor>Synex Information AB</CorpAuthor></AuthorGroup>
<Title>Synex ViewPort Version 2.1 Programmer's Manual</Title>
<PubDate>1998</PubDate></BookBiblio></BiblioEntry>
<BiblioEntry Id = "Bib-tra.xref" XRefLabel = "Tra95"><BookBiblio>
<AuthorGroup><Author><Surname>Travis</Surname><Firstname>Brian</Firstname></Author>
<Author><Surname>Waldt</Surname><Firstname>Dale</Firstname></Author></AuthorGroup>
<Title>The SGML Implementation Guide</Title><Subtitle>A Blueprint
for SGML Migration</Subtitle><Publisher><PublisherName>Springer-Verlag</PublisherName></Publisher>
<PubDate>1995</PubDate></BookBiblio></BiblioEntry>
<BiblioEntry Id = "Bib-tur96.xref" XRefLabel = "Tur96"><BookBiblio>
<AuthorGroup><Author><Surname>Turner</Surname><Firstname>Ronald</Firstname>
<OtherName>C.</OtherName></Author><Author><Surname>Douglass</Surname>
<Firstname>Timothy</Firstname><OtherName>A.</OtherName></Author>
<Author><Surname>Turner</Surname><Firstname>Audrey</Firstname>
<OtherName>J.</OtherName></Author></AuthorGroup><Title>README.1ST</Title>
<Subtitle>SGML For Writers and Editors</Subtitle><Publisher>
<PublisherName>Prentice-Hall, Inc</PublisherName></Publisher><PubDate>1996</PubDate></BookBiblio></BiblioEntry>
<BiblioEntry Id = "BABBJCHH" XRefLabel = "Van98"><BookBiblio>
<AuthorGroup><Author><Surname>Vanvliet</Surname><Firstname>Peter</Firstname>
<OtherName>A.</OtherName></Author></AuthorGroup><Title>CodeCount</Title>
<Url xlink:type="simple" xlink:href="http://www.nanocorp.com/vanvliet/peter/codecount/codecount.htm"
    Hytime = "CLINK" Hynames = "linkend href"
    Loctype = "href queryloc URL"></Url><PubDate>1998</PubDate></BookBiblio></BiblioEntry>
<BiblioEntry Id = "BABCFJHJ" XRefLabel = "Vli98"><BookBiblio>
<AuthorGroup><Author><Surname>Vlissides</Surname><Firstname>John</Firstname></Author></AuthorGroup>
<Title>Pattern Hatching</Title><Subtitle>Design Patterns Applied</Subtitle>
<Publisher><PublisherName>Addison-Wesley Publishing Company</PublisherName></Publisher>
<PubDate>1998</PubDate></BookBiblio></BiblioEntry>
<BiblioEntry Id = "BGBFHJDA" XRefLabel = "W3C96"><BookBiblio>
<AuthorGroup><CorpAuthor>World Wide Web Consortium</CorpAuthor></AuthorGroup>
<Title>Cascading Style Sheets (CSS)</Title><Url
    xlink:type="simple" xlink:href="http://www.w3.org/TR/REC-css" Hytime = "CLINK"
    Hynames = "linkend href" Loctype = "href queryloc URL"></Url>
<PubDate>1996</PubDate></BookBiblio></BiblioEntry>
<BiblioEntry Id = "BGBBFHGF" XRefLabel = "W3C98"><BookBiblio>
<AuthorGroup><CorpAuthor>World Wide Web Consortium</CorpAuthor></AuthorGroup>
<Title>Extensible Markup Language (XML) 1.0</Title><Url
    xlink:type="simple" xlink:href="http://www.w3.org/TR/REC-xml" Hytime = "CLINK"
    Hynames = "linkend href" Loctype = "href queryloc URL"></Url>
<PubDate>1998</PubDate></BookBiblio></BiblioEntry>
<BiblioEntry Id = "BABGEFHE" XRefLabel = "W3C99a"><BookBiblio>
<AuthorGroup><CorpAuthor>World Wide Web Consortium</CorpAuthor></AuthorGroup>
<Title>Namespaces in XML</Title><Url
    xlink:type="simple" xlink:href="http://www.w3.org/TR/REC-xml-names/" Hytime = "CLINK"
    Hynames = "linkend href" Loctype = "href queryloc URL"></Url>
<PubDate>1999</PubDate></BookBiblio></BiblioEntry>
<BeginPage/>
<BiblioEntry Id = "BABJHHFH" XRefLabel = "W3C99b"><BookBiblio>
<AuthorGroup><CorpAuthor>World Wide Web Consortium</CorpAuthor></AuthorGroup>
<Title>Extensible Stylesheet Language (XSL) Specification</Title>
<Subtitle>W3C Working Draft 21 April 1999</Subtitle><Url
    xlink:type="simple" xlink:href="http://www.w3.org/TR/WD-xsl/" Hytime = "CLINK"
    Hynames = "linkend href" Loctype = "href queryloc URL"></Url>
<PubDate>1999</PubDate></BookBiblio></BiblioEntry>
<BiblioEntry Id = "BABHJDBB" XRefLabel = "Wak99"><BookBiblio>
<AuthorGroup><Author><Surname>Wakizono</Surname><Firstname>Ryuji</Firstname></Author>
<Author><Surname>Kawamura	</Surname><Firstname>Toshikazu</Firstname></Author>
<Author><Surname>Tsuchiya</Surname><Firstname>Takehiko</Firstname></Author>
<Author><Surname>Hatanaka</Surname><Firstname>Takahiro</Firstname></Author>
<Author><Surname>Tanaka</Surname><Firstname>Tatsuji</Firstname></Author></AuthorGroup>
<Title>Object-Oriented Database Management System for Process Control
Systems -Development and Evaluation-</Title><PubsNumber>in "SAC
'99", Proceedings of the 1999 ACM Symposium on Applied Computing</PubsNumber>
<Publisher><PublisherName>pages 204-209</PublisherName></Publisher>
<Publisher><PublisherName>ACM</PublisherName></Publisher><PubDate>1999</PubDate></BookBiblio></BiblioEntry>
<BiblioEntry Id = "BABDEEDE" XRefLabel = "Whi99"><BookBiblio>
<AuthorGroup><Author><Surname>Whitehorn</Surname><Firstname>Mark</Firstname></Author>
<Author><Surname>Marklyn</Surname><Firstname>Bill</Firstname></Author></AuthorGroup>
<Title>Inside Relational Databases</Title><Subtitle>With Examples
in Access</Subtitle><Publisher><PublisherName>Springer-Verlag</PublisherName></Publisher>
<PubDate>1999</PubDate></BookBiblio></BiblioEntry>
<BiblioEntry Id = "BABBJDIE" XRefLabel = "Yar99"><BookBiblio>
<AuthorGroup><Author><Surname>Yarger</Surname><Firstname>Randy</Firstname>
<OtherName>Jay</OtherName></Author><Author><Surname>Reese</Surname>
<Firstname>George</Firstname></Author><Author><Surname>King</Surname>
<Firstname>Tim</Firstname></Author></AuthorGroup><Title>MySQL &amp;
mSQL</Title><Publisher><PublisherName>O'Reilly &amp; Associates,
Inc.</PublisherName></Publisher><PubDate>1999</PubDate></BookBiblio></BiblioEntry></Bibliography>
<Appendix Id = "App.xref"><Title>Database DTD</Title>
<Para>This appendix presents the predefined, hardcoded database
DTD used by Multidoc Pro Database Browser and Publisher. It should
be noted that the actual DTD is slighty more detailed than what the
user sees (and what is coded in the program). The reason for this
is that the actual DTD anticipates some enhancements and changes
in the future.</Para>
<Para><ProgramListing Format = "linespecific">
&lt;!--******************************************************************--
--* *-- 
--* Database DTD for MultiDoc PRO *-- 
--* Version 1.0 *-- 
--* May 5, 1997 *-- 
--* *-- 
--* *-- 
--* Joakim Östman, Oy CITEC AB Information Technology *-- 
--* Minor modifications by Heikki Toivonen, CITEC *-- 
--* *-- 
--* *-- 
--* Typical doctype declaration of document sets *-- 
--* &lt;!DOCTYPE Database PUBLIC *--
--* &quot;-//CITEC INFORMATION TECHNOLOGY//DTD Database//EN&quot;
&quot;DATABASE.ENT&quot; *-- 
--********************************************************************--&gt; &lt;!DOCTYPE
database [ 
&lt;!ELEMENT database - O (level+) &gt; 
&lt;!ATTLIST database 
database CDATA #IMPLIED &gt; 

&lt;!ELEMENT level - O (title? , (data* , level*)) &gt; 
&lt;!ATTLIST level 
table CDATA #IMPLIED 
subdoc CDATA #IMPLIED &gt; 

&lt;!ELEMENT title - O ((#PCDATA) | subscript | superscript)+ &gt; 
&lt;!ATTLIST title 
column CDATA #IMPLIED &gt; 

&lt;!ELEMENT superscript - O (#PCDATA) &gt; 

&lt;!ELEMENT subscript - O (#PCDATA) &gt; 

&lt;!ELEMENT data - O (dataname? , datavalue*) &gt; 

&lt;!ELEMENT dataname - O ((#PCDATA) | subscript | superscript)+
&gt; 
&lt;!ATTLIST dataname 
column CDATA #IMPLIED &gt; 

&lt;!ELEMENT datavalue - O (datadescription* | reference* | (value
, unit?)*) &gt; 

&lt;!ELEMENT datadescription - O ((#PCDATA) | subscript | superscript)+
&gt;
&lt;!ATTLIST datadescription 
column CDATA #IMPLIED &gt; 

&lt;!ELEMENT reference - O (ref.name? , nameloc) &gt; 
&lt;!ATTLIST reference 
id ID #REQUIRED 
mediatype (SGML , NON-SGML) &quot;SGML&quot; 
linkend IDREF #REQUIRED 
HyTime NAME &quot;clink&quot; 
column CDATA #IMPLIED &gt; 

&lt;!ELEMENT ref.name - O ((#PCDATA) | subscript | superscript)+
&gt; 
&lt;!ATTLIST ref.name 
column CDATA #IMPLIED &gt; 

&lt;!ELEMENT nameloc - O (nmlist) &gt; 
&lt;!ATTLIST nameloc 
id ID #REQUIRED 
HyTime NAME &quot;nameloc&quot; &gt; 

&lt;!ELEMENT nmlist - O (#PCDATA) &gt; 
&lt;!ATTLIST nmlist 
docorsub ENTITY #IMPLIED 
nametype (element , entity) &quot;element&quot; 
HyTime NAME &quot;nmlist&quot; &gt; 

&lt;!ELEMENT value - O ((#PCDATA) | subscript | superscript)+ &gt; 
&lt;!ATTLIST value 
column CDATA #IMPLIED &gt; 

&lt;!ELEMENT unit - O ((#PCDATA) | subscript | superscript)+ &gt; 
&lt;!ATTLIST unit 
column CDATA #IMPLIED &gt; 
]&gt;</ProgramListing></Para></Appendix>
<Appendix Id = "Database-dtd2.xref"><Title>Database DTD 2</Title>
<Para>This appendix shows an attempt to fix some of the shortcomings
of the first version of the database <Acronym>DTD</Acronym> that
was implemented in Multidoc Pro Database Browser and Publisher.
This <Acronym>DTD</Acronym> can store all the information about
the connection to the database and the queries needed to dynamically
build the <Acronym>SGML</Acronym> document while browsing the document
instance. No hidden data would be needed. Queries can be reused
for greater efficiency.</Para>
<Para>The document language specified in this <Acronym>DTD</Acronym> is
both the database mapping save format and the format of the dynamically
built <Acronym>SGML</Acronym> document.</Para>
<Para>This <Acronym>DTD</Acronym> is a work in progress and has
not been tested to offer everything that would be needed.</Para>
<Para><ProgramListing Format = "linespecific">
&lt;!--*****************************************************************-- 
--* *-- 
--* Database DTD for MultiDoc PRO *-- 
--* Version 1.5 *-- 
--* Jan 31, 1998 *-- 
--* *-- 
--* (c) CITEC Engineering Oy 1996-1998 *-- 
--* Heikki Toivonen *-- 
--* *-- 
--* *-- 
--* Typical doctype declaration *-- 
--* &lt;!DOCTYPE Database PUBLIC *-- 
--* &quot;-//CITEC INFORMATION TECHNOLOGY//DTD Database//EN&quot;
&quot;DATABASE.ENT&quot; *-- 
--*******************************************************************--&gt; 
&lt;!-- TODO: Parameter passing/relationships/opening levels--&gt; 
&lt;!-- General notice about attribute values: - quotes are double
quotes (&quot;)
except when the value contains double quotes - if the value contains 
double quotes, the quotes will be single quotes (') - if the value 
contains both single and double quotes, the quotes will be double
quotes
and double quotes in the value will be replaced with an &quot;entity&quot;
(&amp;gt;) 
--&gt; 
&lt;!-- If this DTD is used to save a database mapping configuration,
the
root element will be configuration. For documents generated based
on the
configuration the root element is either databases or database depending
on the windows attribute. --&gt; 

&lt;!ELEMENT configuration - - (configuration-title? , databases)
&gt; 
&lt;!ATTLIST configuration 
id ID #REQUIRED 
windows (multi | single) &quot;multi&quot; -- multi opens each database
element -- 
-- in its own window -- 
&gt; 

&lt;!ELEMENT configuration-title - - (#PCDATA) &gt; 

&lt;!ELEMENT databases - - (databases-title? , database+) &gt; 
&lt;!ATTLIST databases 
id ID #REQUIRED &gt; 

&lt;!ELEMENT databases-title - - (#PCDATA) &gt; 

&lt;!ELEMENT database - - (dsn, database-title? , meta , level+)
&gt; 
&lt;!ATTLIST database 
id ID #REQUIRED 
generated-levels NUMBER &quot;1&quot; -- Initial number of levels
to generate, -- 
-- negative value means infinite levels -- 
database CDATA #REQUIRED -- ODBC Data Source Name -- 
&gt; 

&lt;!ELEMENT database-title - - (#PCDATA) &gt; 

&lt;!ELEMENT meta - - (stylesheets?, navigators?, webs?, queries?
, contents? , URL-directories? , suffixes? , ndatas?) &gt; 

&lt;!ELEMENT stylesheets - - (stylesheet+) &gt; 

&lt;!ELEMENT stylesheet - - (name,file) &gt; 

&lt;!ELEMENT name - - (#PCDATA) &gt; 

&lt;!ELEMENT file - - (#PCDATA) &gt; 

&lt;!ELEMENT navigators - - (navigator+) &gt; 

&lt;!ELEMENT navigator - - (name,file) &gt; 

&lt;!ELEMENT webs - - (web+) &gt; 

&lt;!ELEMENT web - - (name,file) &gt; 
&lt;!ATTLIST web 
web (web | docweb) &quot;web&quot; -- CHECK THIS! -- 
&gt; 

&lt;!-- Queries are stored here so they can be reused and it is
easier to
change them. The queries are referenced from the elements using
them. --&gt;
&lt;!ELEMENT queries - - (query+) &gt; 

&lt;!ELEMENT query - - (sql) &gt; 
&lt;!ATTLIST query 
id ID #REQUIRED &gt; 

&lt;!-- The SQL text of the query could be represented without SGML
structure
if there was a suitable SQL parser. Alternative/addition is to have
a DTD
fragment describing the structure of SQL queries. Below is a simple 
version. 
Example 1: SELECT DISTINCT Doctors.Name,Rooms.* FROM Doctors,Rooms
WHERE
(Doctors.Age &gt; 50) AND ((Rooms.Wing = 'East') AND NOT (Doctors.Patients
&gt;=
100)) ; 

&lt;sql id=&quot;sql-1&quot;&gt; 
&lt;clause&gt; 
&lt;select distinct=&quot;distinct&quot;&gt; 
&lt;columns&gt; 
&lt;table-column-entry&gt; 
&lt;table&gt;Doctors&lt;/table&gt; 
&lt;column&gt;Name&lt;/column&gt; 
&lt;/table-column-entry&gt; 
&lt;table-column-entry&gt; 
&lt;table&gt;Rooms&lt;/table&gt; 
&lt;column all=&quot;all&quot;&gt;&lt;/column&gt; 
&lt;/table-column-entry&gt; 
&lt;/columns&gt; 
&lt;/select&gt;
&lt;from&gt; 
&lt;tables&gt; 
&lt;table&gt;Doctors&lt;/table&gt; 
&lt;table&gt;Rooms&lt;/table&gt; 
&lt;/tables&gt; 
&lt;/from&gt; 
&lt;where&gt; 
&lt;where-group&gt; 
&lt;table-column-entry&gt; 
&lt;table&gt;Doctors&lt;/table&gt; 
&lt;column&gt;Name&lt;/column&gt; 
&lt;/table-column-entry&gt; 
&lt;operator operator=&quot;lt&quot;&gt; 
&lt;column-value&gt;50&lt;/column-value&gt; 
&lt;/where-group&gt; &lt;
where-group logical-operator=&quot;AND&quot;&gt; 
&lt;where-group&gt; 
&lt;table-column-entry&gt; 
&lt;table&gt;Rooms&lt;/table&gt; 
&lt;column&gt;Wing&lt;/column&gt; 
&lt;/table-column-entry&gt; 
&lt;operator operator=&quot;eq&quot;&gt; 
&lt;column-value quotes=&quot;quotes&quot;&gt;East&lt;/column-value&gt; 
&lt;/where-group&gt; 
&lt;where-group logical-operator=&quot;AND&quot; reverse=&quot;reverse&quot;&gt; 
&lt;table-column-entry&gt; 
&lt;table&gt;Doctors&lt;/table&gt; 
&lt;column&gt;Patients&lt;/column&gt; 
&lt;/table-column-entry&gt; 
&lt;operator operator=&quot;ge&quot;&gt; 
&lt;column-value&gt;100&lt;/column-value&gt; 
&lt;/where-group&gt; 
&lt;/where-group&gt; 
&lt;/where&gt; 
&lt;/clause&gt; 
&lt;/sql&gt; --&gt; 

&lt;!ELEMENT sql - - (clause+ , order-by?) &gt; 
&lt;!ATTLIST sql 
id ID #REQUIRED &gt; 

&lt;!ELEMENT clause - - (clause* | (select , from , where? , order-by?)
) &gt;
&lt;!ATTLIST clause 
union CDATA #FIXED &quot;UNION&quot; -- ignored for 1st clause --
&gt; 

&lt;!ELEMENT select - - (columns*) &gt; 
&lt;!ATTLIST select 
distinct (distinct | nodistinct) &quot;distinct&quot; 
all (all | notall) &quot;notall&quot; -- if all, semantic error
to have columns --
count (count | nocount) &quot;nocount&quot; -- if count, semantic
error to have 
columns -- &gt; 

&lt;!ELEMENT columns - - (table-column-entry+) &gt; 

&lt;!ELEMENT table-column-entry - - (table , column) &gt; 

&lt;!ELEMENT table - - (#PCDATA) &gt; 
&lt;!ATTLIST table 
all (all | notall) &quot;notall&quot; -- if all, semantic error
to have columns all
-- 
-- also semantic error to have content -- &gt; 

&lt;!ELEMENT column - - (#PCDATA) &gt; 
&lt;!ATTLIST column 
all (all | notall) &quot;notall&quot; -- if all, semantic error
to have tables 
all --
-- also semantic error to have content -- &gt; 

&lt;!ELEMENT from - - (tables+) &gt; 

&lt;!ELEMENT tables - - (table+) &gt; 

&lt;!ELEMENT where - - (where-group+) &gt; 

&lt;!ELEMENT where-group - - ( where-group* | (table-column-entry
, operator
, column-value) ) &gt; 
&lt;!ATTLIST where-group 
logical-operator (AND | OR) #IMPLIED -- semantic error on first,
otherwise
required -- 
reverse (reverse | normal) &quot;normal&quot; -- NOT -- &gt; 

&lt;!ELEMENT operator - o EMPTY &gt; 
&lt;!ATTLIST operator 
operator (eq | ne | lt | le | gt | ge | like) #REQUIRED 
-- = &lt;&gt; &lt; &lt;= &gt; &gt;= LIKE -- &gt; 

&lt;!ELEMENT column-value - - (#PCDATA | table-column-entry) &gt; 
&lt;!ATTLIST column-value 
quotes (quotes | noquotes) &quot;noquotes&quot; &gt; 

&lt;!ELEMENT order-by - - (columns+) &gt; 
&lt;!ATTLIST order-by direction (asc | desc) &quot;asc&quot; &gt; 

&lt;!-- Contents, URL-directories, suffixes and NDATAs are also
reusable.--&gt;
&lt;!ELEMENT contents - - (content+) &gt; 

&lt;!ELEMENT content - - (#PCDATA) &gt; 
&lt;!ATTLIST content 
id ID #REQUIRED &gt; 

&lt;!ELEMENT URL-directories - - (dir+) &gt; 

&lt;!ELEMENT dir - - (#PCDATA) &gt; 
&lt;!ATTLIST dir 
id ID #REQUIRED &gt; 

&lt;!ELEMENT suffixes - - (suffix+) &gt; 

&lt;!ELEMENT suffix - - (#PCDATA) &gt; 
&lt;!ATTLIST suffix 
id ID #REQUIRED &gt; 

&lt;!ELEMENT ndatas - - (ndata+) &gt; 

&lt;!ELEMENT ndata - - (#PCDATA) &gt; 
&lt;!ATTLIST ndata 
id ID #REQUIRED &gt; 

&lt;!-- A level element has a pointer to its query. Some of the
query's 
output columns are mapped to its child elements like title, 
datadescription and so on. NOTE: It would be possible to required
explicit
output columns. Then it would be possible to point from a column-mappable
element to a table-column-entry. A level element can have multiple 
relationships with its children. At least one relationship must
exist
between a parent element and each of its children. During the mapping
the
left and right-hand side table-column-entry's are filled. When a
root
level (no ancestor level elements) element is being generated, its 
relationships mappings are checked, and the level element's query
is 
modified accordingly. After the query has been resolved and the
level SGML
is being written, the column-value parts of the relationships will
be
written. When a child level item is being generated, the parent
levels
relationships are checked, as well as the current relationships,
and the
actual query is modified accordingly. The child can find the correct 
relationships with pointers to it's parent's relationships. --&gt; 
&lt;!ELEMENT level - - (title? , level-meta? ,(data* , level*))
&gt; 
&lt;!ATTLIST level 
query IDREF #REQUIRED 
relationships CDATA #IMPLIED -- parent level's relationship ids
-- &gt;

&lt;!ELEMENT level-meta - - (relationship+) &gt; 

&lt;!ELEMENT relationship - - (table-column-entry , table-column-entry
, 
column-value?) &gt; 
&lt;!ATTLIST relationship 
id CDATA #REQUIRED -- CDATA because multiple same id's -- &gt; 

&lt;!ELEMENT title - - ((#PCDATA) | subscript | superscript)+ &gt; 
&lt;!ATTLIST title 
content IDREF #IMPLIED 
table CDATA #IMPLIED 
column CDATA #IMPLIED &gt; 

&lt;!ELEMENT superscript - - (#PCDATA) &gt; 

&lt;!ELEMENT subscript - - (#PCDATA) &gt; 

&lt;!ELEMENT data - - (dataname? , datavalue*) &gt; 

&lt;!ELEMENT dataname - - ((#PCDATA) | subscript | superscript)+
&gt; 
&lt;!ATTLIST dataname 
content IDREF #IMPLIED 
table CDATA #IMPLIED 
column CDATA #IMPLIED &gt; 

&lt;!ELEMENT datavalue - - (datadescription* | reference* | (value
, unit?)*)
&gt; 

&lt;!ELEMENT datadescription - - ((#PCDATA) | subscript | superscript)+
&gt;
&lt;!ATTLIST datadescription 
content IDREF #IMPLIED 
table CDATA #IMPLIED 
column CDATA #IMPLIED &gt; 

&lt;!-- The directory, filename, suffix and NDATA are found in entities
in a
static SGML/HyTime document. Dynamically built document must store
this
information someplace else, here it is done with references to database
meta material. The filename attribute is filled during document 
generation. Handling the link to external document also poses a
problem
with dynamic document because the link is normally managed through 
entities. There are two choises: either dynamically generate new
entities
or hook into the process that deals actually reading the entity's 
contents. The latter is how Multidoc Pro does this: when the entity
stored
in nmlist element is required, the parent reference element information
is
looked instead of normal entity handling. --&gt; 
&lt;!ELEMENT reference - - (ref.name? , nameloc) &gt; 
&lt;!ATTLIST reference 
id ID #REQUIRED 
mediatype (SGML , NON-SGML) &quot;SGML&quot; 
linkend IDREF #REQUIRED 
HyTime NAME &quot;clink&quot; 
directory IDREF #IMPLIED 
filename CDATA #IMPLIED 
suffix IDREF #IMPLIED 
ndata IDREF #IMPLIED -- Defaults to SGML -- 
table CDATA #IMPLIED 
column CDATA #IMPLIED &gt; 

&lt;!ELEMENT ref.name - - ((#PCDATA) | subscript | superscript)+
&gt; 
&lt;!ATTLIST ref.name 
content IDREF #IMPLIED 
table CDATA #IMPLIED 
column CDATA #IMPLIED &gt; 

&lt;!ELEMENT nameloc - - (nmlist) &gt; 
&lt;!ATTLIST nameloc 
id ID #REQUIRED 
HyTime NAME &quot;nameloc&quot; &gt; 

&lt;!ELEMENT nmlist - - (#PCDATA) &gt; 
&lt;!ATTLIST nmlist 
entity-info IDREF #REQUIRED 
docorsub ENTITY #IMPLIED 
nametype (element , entity) &quot;element&quot; 
HyTime NAME &quot;nmlist&quot; &gt; 

&lt;!ELEMENT value - - ((#PCDATA) | subscript | superscript)+ &gt; 
&lt;!ATTLIST value 
content IDREF #IMPLIED 
table CDATA #IMPLIED 
column CDATA #IMPLIED &gt; 

&lt;!ELEMENT unit - - ((#PCDATA) | subscript | superscript)+ &gt; 
&lt;!ATTLIST unit 
content IDREF #IMPLIED 
table CDATA #IMPLIED 
column CDATA #IMPLIED &gt;</ProgramListing></Para></Appendix>
<Appendix Id = "CHDBBJJE"><Title>Sample Database Mapping</Title>
<Para>This appendix shows the beginning of a sample database mapping
file. The plan was that this format would have been replaced by
a binary format. The <XRef Linkend = "Database-dtd2.xref"
    xlink:type="simple" xlink:href="#Database-dtd2.xref"
     >Appendix B</XRef> shows an SGML
version that is both the display format and save format at the same
time.</Para>
<Para>The DSN row in the beginning identifies the ODBC data source
name which this mapping connects to. The KEY rows identify nodes,
MAP and REL rows contain information about the actual mapping of
the node. Nodes become SGML elements in the generated document instance.</Para>
<ProgramListing Format = "linespecific">DSN=LSAR SAMPLE
KEY=DATABASE:1
MAP=LSAR SAMPLE;0;0;LSAR SAMPLE;0;0()()
REL=
KEY=DATABASE:1,LEVEL:1
MAP=Projects;1;0;Projects;0;0()()
REL=Projects.Project ID = Systems.Project ID
KEY=DATABASE:1,LEVEL:1,TITLE:1
MAP=Project Name;2;0;Projects;0;0()()
REL=
KEY=DATABASE:1,LEVEL:1,LEVEL:1
MAP=Systems;2;0;Systems;0;0()()
REL=Systems.Documents ID = Documents.Documents ID;Systems.Documents
ID = Per Maint Documents.Documents ID;Systems.System ID = Units.System
ID

...</ProgramListing></Appendix>
</Book>
