Back to Vivagraphs Home

Vivagraphs: Bringing Context to Personal Hypermedia

Richard B. Dietz - August 16, 2000

Abstract

Our everyday lives are full of both ordinary moments and those rich in personal meaning. Many of us use media-capture technology such as still photography, audio-video recordings, hand-scribed personal diaries and the like to document our lives. The convergence of media capture, geographic positioning and Internet technologies has made it possible to "capture the moment" with a new form of media we call Vivagraphs. Vivagraphs are multi-modal media recordings which communicate a sense of their own context via embedded, linked or otherwise associated context data, then made available over the Internet as a form of hypermedia. Vivagraphs fuse elements of personal media, multimedia, contextual media and Internet hypermedia to communicate personal experience and convey personal documentary. Vivagraphs are made possible by a set of interoperable technologies: an XML-derived markup language, a customized "browser" application, specialized authoring software and a customized media capture device.

Table of Contents

Table of Figures

Table of Tables

1 Introduction

"In photography, the smallest thing can become a big subject, an insignificant human detail can become a leitmotiv. We see and we make seen as a witness to the world around us; the event, in its natural activity, generates an organic rhythm of forms." Henri Cartier-Bresson

1.1 The Decisive Moment

For many of us capturing the "moments of our lives" is an integral part of our interaction with the world and those with whom we share it. Recording the events of the world around us and the notable elements of our lives in personal documentary can be very much a passion—one which gives rise to bursting photo-albums, reams of home video and finely-crafted diaries. This march of the scribe in our private and public lives is a reflection of the workings of our minds and memories and makes possible the transmission of much of which we call human culture.

External visual, auditory and other sensory stimuli can be reproduced by human expression, offering a window into the mind and experience of others. As our technology has advanced, so too has the fidelity of these messages as we've moved from direct human-to-human re-presentation to analog reproduction and lately (in anthropological time) into the world of identical bit-for-bit, infinitely reproducible digital media. We've followed this march of the scribe as it has wound through the analog world of Gutenburg's printing press, Niépce and Daguerre's photographic processes and Edison's phonograph into the purely digital arena of the computer and the Internet.

The Internet has made possible the construction of a robust and intricate interconnected digital hypertext and hypermedia network where our artifacts are now (by their very nature) perfectly, limitlessly and effortlessly reproducible, and which may be transmitted with relative ease anywhere in the world. The nature of media has changed; no longer requiring tangible physical storage like the wax phonograph recording or even the plastic audio CD, a pure digital stream, the MP31, steals the scene. Our media players may remain tangible, but the media is itself intangible, transitory and abstract, where a song is ultimately nothing more than a single very long base 2 number, a series of 0's and 1's.

We are entering a period marked by the infusion of the digital into our media, our tools and our everyday surroundings, a synthesis of digital bits and tangible atoms[1]. We can now read and scrawl on digital paper[2, 3] and encase our downloaded loved-ones in digital picture frames[4, 5]. Everyday physical objects can have digital information added to them "augmenting" reality with goggles and position data[6, 7] so that even the real word is increasingly digital, further blurring the boundary between digital information and the physical world.

This synthesis has made possible an amazing flowering of novel and exciting technologies. We are at a point in terms of our development in internetworking, digital media capture technology, and context-aware systems to be able to literally "capture the moment," to store it and share it with others and do so in an easily understandable and transparent fashion. Such moments are personal to our lives and multi-sensory in nature, often requiring an understanding of their context to fully convey their significance. Media which capture these moments are very often shared with others and are indeed meant to be so. But these moments are also important because they are meaningful to us as individuals. Conveying significance and contextual depth are tasks for which these new forms of hybrid media are uniquely suited and is the focus of this work. These facets are important as they can help us better understand the way we interact with media and plot the future directions our mediated lives will take as our technology continues to transform us.

1.2 Camera Obscura

Snap! 1/125th of a second is frozen in silver and gelatin. At the release of the shutter, we have hopefully captured something, perhaps communicative, representative, and perhaps even something meaningful. But digitally speaking (if we are indeed digitally speaking) we've created nothing but a binary string of 0's and 1's. Transforming this binary number into an image, a sound, or some other form of media, is still no indication that the resulting media will have any staying power in the minds of anyone other than oneself (and often not even then). There are layers of meaning; while a digital image file in and of itself is no less the image captured, until it has been transformed into a visual representation, the essence of the media—its communicative properties—are locked away. Even given the assembled image, we often require narration to provide a sense of context before we can begin to appreciate the media's significance and meaning, in addition to its prima facie aesthetic attributes.

What is it then that makes media compelling and communicative, that reaches out causing one to pause, transfixed by the work? One such quality, Roland Barthes, a French literary and social critic, and scholar of photography, called "punctum." Barthes writes that a photograph's "punctum is [a] speck, cut, little hole—and also a cast of the dice, […] that accident which pricks me (but also bruises me, is poignant to me), [which] rises from the scene, shoots out of it like an arrow and pierces me." Punctum is the character which projects a photograph beyond the common fare, beyond the otherwise pleasant images which convey societal values and universal aesthetic appeal, what Barthes refers to as the "studium." Studium exists "as a consequence of …knowledge, …culture ...a body of information." It is the everyday, the unconscious and collectively pleasing, a "very wide field of unconcerned desire, of various interest, of unconcerned taste." [8]

Henri Cartier-Bresson, a French painter and photographer, is well-know for his approach to photography and his singularly compelling images. In his book, The Decisive Moment, he writes, "A velvet hand, a hawk's eye—these we should all have...If the shutter was released at the decisive moment, you have instinctively fixed a geometric pattern without which the photograph would have been both formless and lifeless."[9] Cartier-Bresson articulates here, decades before Barthes, something of the element of punctum.

Figure 1.1 Behind the Gare St. Lazare, 1932. Henri Cartier-Bresson

Figure 1.1 Behind the Gare St. Lazare, 1932. Henri Cartier-Bresson

In Cartier-Bresson's work, we find a unique facility for pictographically seizing the moment, for capturing a fragment of place and time in such a fashion that it speaks intimately to the viewer. His work seems to say, "I am a moment like no other, frozen and magical." However, frozen, it conveys the artist's perspective at a singular intersection of place and time—a circumstance, a context that invites the viewer into meaningful introspection. But that moment is in reality no different from any other time and place; from any number of other perspectives it would have passed from the mind and in time without notice—media without meaning, mute to speak of its significance, anonymous and without character, a pattern of silver-oxide in the limbo between pure noise and meaningful communication, ultimately not much more than a random pattern of 0's and 1's.

Somewhere between the noise of random 0's and 1's and the hum of pure information lies most media. Media which engage us in dialog do so not simply because of the information they explicitly convey, but also because of what they elicit from within the viewer, shifting the balance further from noise to meaning, rising above the everyday and at play in the mind.

Elements of circumstance and context invite the observer to view media in the framework of their own lives and personal experience. When we connect with media in this way it is often either because we already have a rich sense of its context—due to societal or personal significance—or that the media is sufficiently evocative in and of itself as to invite us to provide it from within ourselves. Media we create on our own accord, such as personal photographs, audio and home video are already by their nature infused with such personal contextual cues. For those immediate to its creation, the context, significance and meaning of a media work is simply subtext. But, for those outside the creator's circle, these supporting facets are absent; the media sinks into the studium with no frame of personal reference or link to any discernable meaning.

In the spirit of Barthes' examination of photography I offer an addition which seeks to encompass an appeal of photography and media-in-general I've found absent in my examination of punctum and studium—something which stands in opposition to the common and public face of the studium and the atomic character of punctum. Something which describes the properties of media which affect the individual viewer through communicative aspects that are unique to the life circumstance of that viewer and result in appreciation of the media in the context of one's life-as-history. This character of media I call contextum.

Contextum is directly derived from the Latin contextum, which means "to weave; to connect." It also encompasses the flavor of the the Latin term, conlusum, which means "to play with; to have a secret understanding with." Thus contextum is defined as the character of media which links the viewer and the work in a playful interchange and a personal dialog unique to each viewer and context of the work. Contextum is a "Eureka!" moment where "I have found it!" and "it has found me!" When "I have found it," I've seen through the eyes of the author and found meaning through this link to the original context. When "it has found me," the media speaks to the context of my life; it has significance the author could never have known, where I realize a personal meaning and exchange with the work that is unique to my own life experience: "This image makes me think of my great-grandmother's rocking chair, the cool strength of her wrinkled hands and the way she would hum when I was very young."

Contextum is the element of inferred meaning and context in media2, be it externally provided by the media itself, derived from meta-content or emerging from within oneself. I experience contextum when I thumb through my own set of photographs of friends, family and miscellaneous adventures. The people and the places are intimately intertwined with my own life experience; their significance is transparent to me and infuses the images, transcending aesthetic considerations. A like collection, absent the contextual grounding provided by authorship, would inspire little of this form of appreciation. Unless I were told the backstory, filling in the contextum, these media could only hope to appeal on the level of the studium.

Now, our technology is more than capable of telling this story and inviting us into the confidence of the author, making all of our media more meaningful. Herein lies the opportunity for a less atomic approach to media creation and capture to emerge. Much of our everyday media exist largely absent of the contextual cues that could elicit a greater understanding, recognition and connection with that media to better tell its story and make it more accessible. Contextual media as described here, aims to make media more meaningful by integrating contextual cues into the media capture, creation, editing and display process. The result of this integration, the a priori infusion of contextum into the digital media capture process, I call Vivagraphs.

1.3 Camera Vivagraphica

"Capturing the moment" requires an attention to the personal relevance of a work, its multi-modal nature, the elements of context in which it exists and draws meaning, and the exchange and sharing of the message. A vivagraph is—as much as possible—a chimera of all of these aspects. More precisely, vivagraphs are interactive multimedia slices of life that are associated with contextual information included with the media and later used to place the media in context, providing a groundwork for its display. The word vivagraph is derived from the Latin root for life, vivus (vivos) but more proximally from the Spanish celebratory exclamation/affirmation, "¡Viva!" The late Latin suffix -graphus, an "instrument for making or transmitting records or images"[10] completes the term. A vivagraph is then literally, a captured record of life, a recorded celebratory moment.

Vivagraphs are a synthesis of four conceptual media types, which are described and discussed in the following sections:

1.3.1 Personal Media

Our everyday lives are full of both ordinary moments and those rich in personal meaning. Many of us use media-capture technology such as still photography, audio-video recordings, hand-scribed personal diaries and the like to document our lives—all forms of personal media that serve as a record of our lives and those around us. The appeal of personal documentary is phenomenal. Personal audiography, photography and videography have grown into multi-billion dollar industries and continue to expand.[11] The digitization of these technologies is also moving apace and is making possible the evolution of new media forms and the exploration of new types of human interaction with these, now largely digital media. Although, the appeal hasn't much changed, the way we capture and share our personal media is changing in significant ways due to the rise of digital media capture and the ubiquitous influence of the Internet.

1.3.2 Multimedia

Often the most compelling moments of our lives live on in our memories as rich sensory experiences. Memories can be evoked by sensory stimuli—an image, a sound, an aroma in the air, a touch—or simple wanderings of the mind. Multi-media speaks of the sights and sounds, smells and tastes of our multi-modal sensory world. But multimedia is a term often used to refer to interactive computer programs that deliver audiovisual content on CD-ROMs or via the World Wide Web. In the context of this thesis however, multimedia is given a more abstract definition, that of a fusion of multiple media or rather multi-modal media—media that by definition plays to more than one sensory modality but is not restricted to any specific file format or delivery mechanism.

In our desire to capture the moment, to convey a connection to the original sensory experience, we cannot ignore the full spectrum of the senses that play a role in sensation, storage and retrieval of mental percepts. Research has found captured sound enhancing recollection and enjoyment of photographic images and vice versa[12]. Frolich and Tallyn called this association of recorded audio segments and still visual images, audiophotographs. Results from their work suggest ambient audio to be a compelling addition to image capture that enhances user experience, memory and ultimately one's personal connection to the media. It is for these and like reasons that multimedia is an integral component of vivagraphs as well

1.3.3 Contextual Media and Contextual Metadata

Often the significance of a particular sample of media lies not in the media itself, but the circumstance in which it was captured—its larger context. The contextual accompaniments of a personal photograph are often what make it compelling, give it staying power in our minds and provide cues to its meaning. The place, the time, the subject matter, media authorship and other forms of contextual information help create a more full understanding and appreciation of a given media element. But most media is divorced from this greater context for all but those people immediate to its creation.

We need some way to associate these contextual data with their corresponding media. We do this with metadata. Metadata are data that describe in a structured fashion the attributes of other data resources. Contextual metadata are an extension of the general concept of metadata: "data about data," which specifically emphasizes the contextual elements of the media object's creation. Contextual media are digital computer resources that are directly associated with, have embedded within them, or are remotely linked with contextual metadata.

Although the notion of contextual metadata is fundamentally little different than that of metadata in general, the distinction serves to emphasize extrinsic information, as opposed to file metadata, like file size, image dimensions, file type, bit depth, sample rate, type of compression, etc., which convey little meaning to human viewers, but are critical to computer manipulation of files. Contextual metadata describes the context or circumstances under which media was collected, captured or created, as opposed to a description of the content explicitly or implicitly portrayed within the media, or the system attributes associated with the media-as-computer-file. Contextual media is then simply media associated with contextual metadata. The following are all valid types of contextual metadata: position information (such as latitude, longitude and altitude), location information (such as the continent, country, state, county, city, etc.), temporal information (like incidence and duration—if applicable) and periodical/occasional information (such as the day of the week, a holiday or event, or an era), etc.

The metadata desirable to flesh out a media element's context is largely dependent on the application at hand and may vary considerably from application to application. Additional properties may help further establish the media element's context, some of which may be of particular use in specialized contextual media visualization applications such as the temperature, soil acidity, type of strata, yearly rainfall, population, genealogy nomenclature, type of government, name of a street, etc. Because of the varied nature of factors contributing to the context of media, contextual metadata values may be continuous, categorical, or discrete as well as descriptive in nature.

The emphasis placed here on access to a rich set of metadata, particularly contextual metadata, is due to the utility and power such an approach offers. My student and volunteer work with Gregory Rawlins and many others4 on the Knownspace[15] project has had the largest influence in this regard. Knownspace is a personal information manager and visualization application, where metadata is ubiquitous and is autonomously retrieved, analysed and winnowed by a variety of software agents within the system. Theprimacy of metadata makes the possibilities for display, manipulation and exploration of information within this information environment quite substantial. Such work aims to make interaction with information more transparent utilizing more useful, meaningful, and context-relevant information about information and computer files, giving data a richer context and allowing new opportunities for interaction.

With the availability of such technologies as GPS (Global Positioning System)[13], context as an integral part of digitally-captured media itself need no longer be absent. The Global Positioning System is a U.S. Department of Defense managed constellation of 24 satellites (give or take) which allows a GPS receiver to compute its position, velocity and time. The accuracy of the public GPS network had historically been diminished by the scrambling system called Selective Availability (SA) which introduced intentional degradation into the signal making accurate localization difficult without additional hardware and base stations. SA use was discontinued May 1, 2000 by President Clinton[14]. Now geophysical context data available to GPS receivers is many times more accurate than it was previously, and contextual metadata can now be acquired with greater accuracy and facility.

Contextual media has the character of being heavily grounded in circumstance, bound like Gulliver in an expanding network of contextual metadata. This stands somewhat in contrast to the atomic quality of most tangible and digital media we deal with where the links to context and meaning are largely ephemeral and constructed in our minds, not inherent properties of the media itself.

1.3.4 Internet Hypermedia

One of the most rewarding things we do with our personal media is share it with those we care about. The Internet and in particular the World Wide Web application has become for many a very effective means of doing this. The fluid exchange and inter-connection of information and media is something that today is amazingly possible to be taken for granted—a place where we can all link information together and instantaneously be publishers to the practically the entire world.5

Vanevar Bush conceived of such a system in 1945 and called it memex[16]; Theodore Nelson with neologistic glee gave it a name, hypertext, in 1965[17]. DARPA laid the foundation, the early Internet, in the late 60's[18]; and Tim Berners-Lee created a true global hypertext network in the early 90's in the form of the World Wide Web[19]. Now that hypertext we call the Web spans the globe. But the Internet-spawned innovations continue with messaging applications, countless Web services and emerging file-sharing platforms like Napster[20], Gnutella[21] and Freenet[22], which are revolutionizing the exchange of all forms of digital expression.

The Internet allows us to publish our own media, sharing it with anyone, anywhere in the world. The World Wide Web and other Internet applications make connecting these media possible using hyperlinks. The ability of nearly anyone to publish linked media resources is the foundation of the World Wide Web and emerging forms of hypermedia like those specified by SMIL[23] and the Vivagraph application described here.

1.3.5 Vivagraphic Media

Vivagraphs are multi-modal media recordings which communicate a sense of their own context via embedded, linked or otherwise associated context data made available over the Internet as a form of hypermedia. Vivagraphs fuse the elements of personal media, multimedia, contextual media and Internet hypermedia to communicate personal experience and convey personal documentary. Vivagraphs are made possible by a set of interoperable technologies: an XML-derived markup language, a prototype browser application, specialized authoring software and a media capture device. These components are discussed in detail in subsequent chapters.

1.3.6 Metaspace

Vivagraphs do not exist in a vacuum, the same way that web pages or web media do not exist apart from the World Wide Web. Vivagraphs are part of a larger Internet application which I call metaspace, an environment populated with Internet hypermedia and organized spatially by contextual metadata—a metadata space. Metaspace is not a new term, but is appropriately descriptive for this concept. Metaspace will be discussed further in later chapters. However, it bears stating that the termVivagraphs is often used throughout this work as a synecdoche for the overall architecture, including the language, browser capture device and metaspace Internet application. When used in this fashion, Vivagraphs is capitalized as in the following sentence…

As presented here, Vivagraphs are an exploratory framework for a new approach to media capture, display, interaction and media sharing as we enter a time when radically new methods of media interaction are possible and new synthetic media forms, excitingly inevitable.

1.4 Formatting and Overview of this Thesis Document

Throughout this text readers will find segments of code and XML markup. These elements will be indicated as such by their font, like this <xml> tag and following example:

<example> This could be an xml document. </example>

Elements and attribute names used in the discussion of the language will be italicized as in the following example:

The cmil element is the root element of all CMIL documents.

The first chapter of this thesis discussed the conceptual and aesthetic underpinnings of the Vivagraph system. It also outlined the general media framework and introduces vivagraphs as a new type of digitally captured media. The remainder of this thesis deals with the more concrete area of the Vivagraph system. Chapter 2 outlines work undertaken by others in related fields, specifically projects that are exploring similar types of media fusion or are using context data in related ways. In Chapter 3, the components of the Vivagraph system are discussed, specifically, the markup language used by the system, the user agent or "browser" and a discussion of manual and automated authoring of Vivagraphic media. Chapter 4 consists of conclusions emerging from the experience developing this work, presented along with future directions and guideposts for further development.

2 Related Work

"A photograph is always invisible, it is not it that we see." Roland Barthes

2.1 Worldboard

2.1.1 Description

Worldboard was originally conceived in 1996 by Jim Spohrer[24]. Spohrer described a world invisibly demarcated by single meter cubes through the aid of global positioning technologies. At single-meter accuracy, one could achieve a useful and sufficiently accurate binding of data/information to a distinct geophysical position on the face of the earth to create, in essence, a form of "planetary chalkboard." Information could be placed and viewed at a particular location, given positioning and visualization hardware such as VR (virtual reality) glasses. This type of system, where information overlays or alters one's perception of the physical world, is often referred to as "augmented reality" or "mediated reality."[7]

An exploration of the Worldboard concept has been undertaken at Indiana University since 1997. I have been involved with this effort both as a student and volunteer. My work on the project began with a collaboration with fellow student Bart Everson that resulted in the design and development of an interactive augmented-reality interface prototype called "dataSphere." I later concentrated on the development of an XML markup language suitable for use with an operational Worldboard system when the client and server components were sufficiently developed. This work began in collaboration with Chris Borland, the result of which we called MRML (Mediated Reality Markup Language).

2.1.2 Discussion

The Worldboard project continues to move forward at Indiana University, Information in Place, Inc. and IBM. Although sharing a common origin in many respects, Worldboard and the Vivagraph project differ in several fundamental areas. The Vivagraph project is not an augmented or mediated reality application and the overall notion of contextual media is not explicitly concerned with presenting media within a real-time/real-place context.

My work developing the original XML language for WorldBoard led to the realization that there was a significantly larger application space for context applications on the Internet than that addressed by Worldboard's augmented reality. Vivagraphs are one such contextual media application. In support of this, I created CMIL (Contextual Media Integration Language). CMIL is the language that serves as the foundation of the Vivagraph application and is designed to be flexible enough to be applicable across other types of contextual media applications, including those such as WorldBoard.

2.2 Informedia Experience-On-Demand

2.2.1 Description

Informedia is a multimedia digital library research initiative at Carnegie Mellon University. The main body of work concentrates on search and retrieval, indexing and navigation of large-scale collections of digital video and multimedia. Informedia uses such methods as speech recognition, image understanding and natural language processing to augment the media corpus and aid in the search and retrieval of requested media. Experience-On-Demand (EOD)[25], sponsored by DARPA, is an offshoot of the Informedia Digital Video Library project which focuses on the collaborative capture of digital media and the centralized synthesis of multiple viewpoints "across people, time, and space." EOD combines the captured media and context data of multiple EOD units to create a collective sensory overview of the environment, which can be used to better coordinate the operation of military field operatives among other applications.

2.2.2 Discussion

EOD is very sophisticated initiative which shares many of the same goals of the simpler Vivagraph architecture. However, they differ in a number of areas as well:

2.3 Kodak (FIS) Field Imaging System

2.3.1 Description

The Kodak (FIS) Field Imaging System is a commercial product of the Kodak corporation which bundles a Kodak DC265 Digital camera with a Garmin GPSIII+ GPS device and ArcView (GIS) Geographic Information System software for viewing digital photographs within the ArcView suite. It is primarily used for professional work such as civil engineering and geologic surveys.

2.3.2 Discussion

The Kodak (FIS) system is the first hardware package I discovered capable of capturing images in tandem with GPS data. The components of the FIS system are essentially the same as those used for the Vivagraph captured device, however I use a DC290 camera, Garmin GPSIII+, modified aluminum brackets, cables and custom scripts. The FIS renders media only by location and requires the ArcView GIS software which is proprietary and significantly more complex than warranted for the Vivagraph system audience.

2.4 Wearable Computers and Personal Imaging

2.4.1 Description

Wearable computers are computing devices which exist within one's personal space and interact with the user as an extension of their perception, memory or other cognitive or physical capabilities. Wearables provide a means to enter data, capture media, and perform calculations that enhance human performance, information processing and even enjoyment.

2.4.2 Discussion

Although not designed with wearable computers in mind, vivagraphs could certainly be captured and composed with a properly outfitted wearable computer such as Steve Mann's wearcomp personal imaging device[26] and many similar systems. Wearable systems could provide an always-on, always-available authoring solution for creating vivagraph media. But despite the increasing ease of use and unobtrusiveness of many wearable systems, they have yet to enter popular use and in some respects violate the aesthetic goals of this project by capturing every moment and removing much of the aesthetic selection process from the human host.

3 Vivagraph System

"Memory is very important, the memory of each photo taken, flowing at the same speed as the event. During the work, you have to be sure that you haven't left any holes, that you've captured everything, because afterwards it will be too late." Henri Cartier-Bresson

3.1 Overview

The Vivagraph system is analogous in structure to the World Wide Web. The World Wide Web is a collection of clients (browsers) and servers that exchange data over HTTP (Hypertext Transfer Protocol)[27], the Web's application-layer protocol. Clients and servers communicate with each other by passing HTTP messages. When a client makes a request of a server for information, the server brokers the request and passes along the requested objects which may be an HTML document or digital media. The Vivagraph system operates in the same fashion. Vivagraph clients (referred to here as user agents) interact with the same HTTP servers supporting the World Wide Web. A user agent requests CMIL (Contextual Media Integration Language) documents and media files which it subsequently renders for users when they have been retrieved. Additional components of the Vivagraph system are the authoring application and the DMCD (Digital Media Capture Device) used to create and edit vivagraphic media (CMIL files, images, sounds, etc.) in situ.

The main elements of this system are:

CMIL (Contextual Media Integration Language)
CMIL is the foundation of the Vivagraph system, allowing the exchange of information between Capture devices, authoring tools, servers, user agents, and ultimately the user.
Shakti (CMIL User Agent)
Shakti is the "browser" application which renders vivagraphs to users.
Authoring
Scripts and programs prepare vivagraph media for display by forming CMIL documents and processing component media.
DMCD (Digital Media Capture Device)
The DMCD captures media and contextual information which are used to create vivagraphs.
HTTP Server/Internet Host
Vivagraphs are made available over the Internet via HTTP, the transport protocol of the World Wide Web.

The relationship between these elements of the system and the human end user are shown in the following figure:

Figure 3.1 System Diagram

Figure 3.1 System Diagram

The elements unique to the vivagraph system are the language, the user agent and the media authoring/capture device. Each of these components is described in the following sections beginning with CMIL, Contextual Media Integration Language. First, however, we introduce some terminology describing the Vivagraph system.

3.1.1 Vivagraph Terminology

The advent of the Internet and the Web has been accompanied with related terminology entering common usage. Presented here are a number of terms used when discussing the operation of the Vivagraph system by way of analogy to the Web.

Internet Application World Wide Web Metaspace
Content hypertext vivagraphs/contextual hypermedia
Language HTML CMIL
Document Unit Web page scene
Root Document home page home scene
Aggregation of Linked Documents Web site metascene

Table 3.1 World Wide Web and Vivagraphs Terminology Comparison

The terms presented in Table 3.1 are defined as follows:

CMIL
Contextual Media Integration Language is an XML-derived structured markup language for creating documents which specify how digital media and contextual metadata are combined to form vivagraphs.
Home Scene
A home scene is a scene which serves as the root document, a table of contents or navigational locus for metascene content.
Metascene
A metascene is an interconnected group of scenes usually hosted on the same server and typically entered via the metascene's home scene.
Metaspace
Metaspace is the Internet application supporting vivagraphs. It may also be thought of the aggregation of all vivagraphic media accessible through the Internet. A metaspace is a multidimensional space arising from a collection of digital media objects (media, files, folders, etc) which have associated with them various forms of contextual and generic metadata. Plotting these objects spatially along axes corresponding to each object's metadata attributes carves out the object's position in this multidimensional metadata space.
Scene
Any rendered CMIL document is a scene.
Vivagraph
1) A vivagraph is a constituent of metaspace; it is an aggregation of digital media and contextual metadata specified in a CMIL document and rendered in a metaspace scene. 2) The system as a whole.

3.2 Contextual Media Integration Language

3.2.1 Introduction

Contextual Media Integration Language (CMIL)[28, 29] is an XML 1.0[30] tag language used to describe the relationships between digital multimedia files and contextual information so that they may be presented to users via a CMIL "browser." XML is essentially as set of rules for creating languages which describe data in the form of structured text files. As a text format, XML-derived documents can be created and read with a simple text editor without regard to platform. Another advantage of XML is that it is a widely accepted standard promulgated by the World Wide Web Consortium (W3C)[31]. As such, XML also boasts a substantial and growing set of free and commercial software tools and libraries for creating, parsing and rendering XML.

Before proceeding any further onto the specifics of CMIL, it would be helpful to define a few XML-related terms which will be used in this and subsequent sections:

Attribute
Attributes provide a way to annotate elements with additional information. An attribute is an element parameter specified in an associated DTD which takes the form: name="value". In the following example, href is an attribute of the a element: <a href="http://www.genericorp.com">hyperlink</a>
DTD
A document type definition defines the overall logical structure of a language. It consists of XML declarations which specify the elements and attributes legal for documents complying with the DTD.
Document
A document is a data stream which, in conjunction with any referenced streams such as external stylesheets or scripts, contains elements conforming to rules specified by an associated DTD.
Element
Elements are the building blocks of documents. Elements give structure to information by following rules specified in an associated DTD. For example, head, title, body and p are a few elements available to authors of HTML, which are nested to create Web pages.
Render
Documents are rendered (presented to the user in some fashion) by a user agent.
Tag
A tag is an instance of an element in a document. All tags must be closed or self-terminated like so: <tag>stuff between tags</tag> or <tag/>.
User Agent
A user agent retrieves, processes and renders documents. User agents are often referred to as clients or informally as "browsers."
Valid
A document is valid if it is well-formed and has been verified to conform to all of the rules specified by an associated DTD.
Well-formed
A document is well-formed if it conforms to proper XML syntax, i.e., the elements are nested properly, terminated and attribute values are quote enclosed, etc.

The DTD for CMIL 0.9, the version of the language described here, is available at http://www.oacea.com/cmil/cmil-0.9.dtd and is included in appendix A of this thesis.

3.2.2 Structure of CMIL

CMIL is a relatively simple, human-readable XML-derived language designed to provide a robust and extensible framework for contextual media Internet applications. CMIL has a numerous facets to it, but it is only one (albeit critical) component of the overall Vivagraph architecture.

XML proscribes rules which determine the ways that information may be structured in XML-derived languages like CMIL. The naming and rules for the nesting of elements, their attribute names and values, and included character data are the primary means this is accomplished. Information can be conveyed by the names and placement of the elements themselves, including the relationships of child and parent elements as seen here:

<parent> <child> <grandchild/> </child> </parent>

Information is also conveyed by the attributes of an element. Attributes are on the order of an adjective describing an element[32]. This is a general rule of thumb as the determination of element names, attributes and arrangement of child elements is often more of an art than a science. The following example shows two elements and their attribute values:

<elephant size="big" wrinkliness="high"/>
<mouse size="small" wrinkliness="low"/>

Information can also be conveyed by placing character data (text) in between the tags of an element as seen in the title, author and isbn elements in the following example:

<book type="inspirational"> <title>The Lost Soul Companion</title> <author>Susan M. Brackney</author> <isbn>0967632307</isbn> </book>

CMIL makes use of the structural rules of XML to create documents a CMIL user agent needs to render scenes to the user. Valid CMIL documents are composed of three sections:

  1. CMIL version information (document type declaration) before the cmil element
  2. a header section delimited by the head element
  3. a body section delimited by the body element

A valid CMIL document firstly declares the version of its Document Type Definition (DTD). The DTD described here is version 0.9; for documents that use this DTD, the following prolog and document type declaration should precede the cmil element:

<?xml version="1.0"encoding="UTF-8"?> <!DOCTYPE cmil SYSTEM "http://www.oacea.com/cmil/cmil-0.9.dtd">

The processing instruction in the first line of the declaration states the XML version used and that the document encoding is compatible with the UTF-8 standard character set. The second section of the declaration contains a URL for the CMIL DTD so that it may be retrieved and validate the document.

The root element of a CMIL document is the cmil element. All elements in a CMIL document are contained within a single cmil element placed below the document's document type declaration. The head and body elements are the only legal child elements of the document root. The header contains elements relating to the document as a document: the title, document metadata, references to style information, scripts, etc. The body contains the actual content—references to the digital media and contextual metadata that form vivagraphs and construct each scene. For additional information about the syntax of these elements please consult the CMIL 0.9 Specification[28]. A CMIL document therefore must take the following form, combining the document type declaration, head and body elements:

<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE cmil SYSTEM "http://www.oacea.com/cmil/cmil-0.9.dtd"> <cmil> <head>…head content…</head> <body>…body content…</body> </cmil>

Figure 3.2 CMIL Document Structure

3.2.2.1 The CMIL Node

The raison d'être for CMIL is to describe vivagraphs for rendering to users by a CMIL user agent. The most critical part of CMIL is the description of vivagraphs; thus CMIL must describe the interrelationship of digital media resources and contextual metadata. In CMIL this integration occurs in the node element of the document body. Each node element holds an attributes child element (containing contextual and content metadata) and a media child element (holding digital media resources).

Figure 3.3 Diagram of a CMIL node Element.

Figure 3.3 Diagram of a CMIL node Element.

The attributes element holds the metadata associated with each node. These metadata may include location and time properties as well as user-defined attributes such as a description of the subject matter, media quality, the temperature, etc.

In the Figure 3-4, taken from a live vivagraph capture session, there are three attributes defined as elements by the DTD (loc, orient and time) and three author-defined attributes (velocity, GPSstatus and EPM). The elements loc, orient and time are the most commonly used node attributes. The att element allows CMIL authors to add their own attributes in the form of property-value pairs. This allows CMIL authors to add additional types of metadata to their nodes.

<attributes> <loc coords="39.16892,-86.53685,+00259" datum="WGS 84"/> <orient bearing="296.56" incline="0.00"/> <time begin="20000506T153854"/> <att name="velocity" content="0.45" metric="meter/second"/> <att name="GPSstatus" content="2D GPS position"/> <att name="EPH" content="013" metric="meter"/> </attributes>

Figure 3.4 A CMIL attributes Element

All attribute values such as those for loc[33] and time[34] elements should be formatted in accordance with ISO (International Organization for Standardization) standards where applicable. The XML Schema specification for data types will also prove useful here as it evolves and is adopted.[35] The att element can also support dynamic attributes by allowing the value to be resolved from a linked dynamic resources or an internal script (using the fragment identifier, #). In the following example, a remote perl script is executed to provided the value for the att element:

<att name="temperature" metric="celcius" src="http://www.temperature.com/cgi-bin/temp.pl"/>

The media element holds URL references to media associated with the CMIL node. Certain media types such as plain text or HTML may be contained within the node itself. The media element houses the following media object elements: animation, audio, image, model, ref, speech, text, textstream, vector, video, and www. The media element combines multi-modal media, including audio, text and video files which constitute a given vivagraph's media. Like the attributes element, any number of media object child elements may be present within the media element, including duplicates—multiple audio elements for example. Below is an example media element.

<media> <image src="dive.jpg"/> <audio src="splash.wav"/> <text>A late summer swim at the lake</text> </media>

Figure 3.5 A CMIL media Element

Ultimately the media and the metadata come together in the node element, which is rendered by CMIL user agents into a vivagraph.

<node id="node01" title="vivagraph1"> <attributes> <loc coords="39.16,-86.53,768.23" datum="WGS 84"/> <time begin="19990612T102000"/> </attributes> <media> <image src="media/images/image.png"/> <audio src="media/audio/audio.aiff"/> <text>A bit of inline annotation</text> </media> </node>

Figure 3.6 A CMIL node Element

Thus far we've explored the general structure of CMIL and how vivagraphs are specified by the CMIL node element. CMIL has been designed to allow authors a considerable degree of flexibility, even to the extent of moving far beyond the Vivagraph application presented here. CMIL treats data and metadata as part and parcel of the same entity—an approach of utility in many application, not only to Vivagraphs.

3.2.3 Language Features

There are a number of language features which bear mentioning here in some form due to their influence on the character of the language. As mentioned before, the specification document[29] describes the entirety of the language in greater detail. Presented here are three important aspects of the language meriting further discussion. They are content negotiation, attribute inheritance, and hyperlinking.

3.2.3.1 Content Negotiation

Often times a given media resource may be available in several different formats, languages, file sizes, etc. so that it may be disseminated as widely as possible. A negotiable resource allows a user agent to select among particular variants of the resource which best match the system configuration and capabilities of the user agent and the preferences of the user. Content negotiation in CMIL is based on the switch element of SMIL 1.0[23]. SMIL is Synchronized Multimedia Integration Language, a W3C (World Wide Web Consortium) recommendation for linear time-based hypermedia and interactive video presentations. CMIL features a switch element which behaves in the same fashion as that in SMIL, but CMIL also offers another content negotiation element, the sift element. Content negotiation may be used to choose between and among a variety of CMIL elements, including node and media object elements to find the most appropriate media resource for the user.

3.2.3.1.1 Switch

The switch element works by evaluating in descending order the selection parameter attributes of its child elements. Selection parameters are attributes the user agent must evaluate such as language, screen resolution, etc. The first child element where all of the selection parameters evaluate to "true" is returned. If none evaluate to "true," the last child element is returned by default. In Figure 3-7, the CMIL user agent is configured for French (fr) display and the second image element is selected. If the user agent were configured for neither French nor English (en), the last element would be returned by default.

<switch title="logo"> <image src="english.png" system-language="en"/> <image src="french.png" system-language="fr"/> <image src="esperonto.png" system-language="en,fr"/> </switch>

Figure 3.7 A CMIL switch Element

3.2.3.1.2 Sift

The sift element is unique to CMIL and works by evaluating the selection parameter attributes of its child elements and returning all elements which evaluate to "true." If none evaluate to "true," none are returned. In the following example, the CMIL user agent is configured for French display, resulting in the latter three image elements being returned. If the user agent were configured for neither French nor English, none of the child image elements would be returned.

<sift> <image src="blossom.png" system-language="en"/> <image src="bubbles.png" system-language="en"/> <image src="buttercup.png" system-language="en"/> <image src="blossom_fr.png" system-language="fr"/> <image src="bubbles_fr.png" system-language="fr"/> <image src="buttercup_fr.png" system-language="fr"/> </sift>

Figure 3.8 A CMIL sift Element

3.2.3.2 Attribute Inheritance

The node elements within the body of a CMIL document can be nested within group elements as well as switch and sift content negotiation elements. The group element is a generic container element which aggregates node and other CMIL body elements. These elements as well as the body element may have an attributes element as a child element. This allows node element attribute values to be inherited from parent elements. Attribute values assigned by parent elements may also be overridden if specified at a more proximal level to the node element itself. In Figure 3-9 we find a CMIL node which both inherits and overrides contextual metadata values from parent group and body elements.

<body> <attributes> <a href="http://www.genericwebsite.com/index.html"/> <orient bearing="ne"/> </attributes> <group> <attributes> <loc coords="39.16,-86.53,766.72" datum="WGS 84"/> <orient bearing="sw"/> <time begin="19990612T102200"/> </attributes> <node class="circle" id="node03" title="node title" > <attributes> <time begin="20000224T140800"/> <att name="temperature" content="20" metric="celcius"/> </attributes> <media> ...media content... </media> </node> </group> </body>

Figure 3.9 CMIL document body with multiple inherited attribute values

The attribute values for the CMIL node element ultimately resolve to:

Element Attributes and Values Origin
a href="http://www.genericwebsite.com/index.html" inherited from body
loc coords="39.169,-86.536,766.7260" datum="WGS 84" inherited from group
orient bearing="sw" inherited from group
time begin="20000224T140800" overridden by node
att name="temperature" content="20" metric="celcius" directly from node

Table 3.2 Inherited Attribute Values

3.2.3.3 Hyperlinking

The ability to link digital resources across disparate computer networks is the essence of hypermedia and the foundation World Wide Web. As a hypermedia language, linking is an essential part of CMIL as well. Hyperlinks in CMIL can be specified in several ways. The a and anchor elements are the primary hyperlinking elements in CMIL, the a element specifies links to whole resources and the anchor element for partial resources such as a specific area of an image or a temporal segment of a video file. Several examples follow…

3.2.4 CMIL Summary

CMIL is the foundation of the Vivagraph system. It allows authors to bring together media resources and contextual metadata in an easily editable and transmissible, standards-compliant and platform-agnostic structured text file. CMIL documents are rendered by a CMIL user agent, presenting scenes containing vivagraphs to users in what is, in essence, a static data-mining internet application, metaspace. Additional information about CMIL is available at http://www.oacea.com/cmil, including a draft specification of the language. The DTD for the CMIL 0.9 language accompanies this thesis in Appendix A.

3.3 CMIL User Agent

A user agent, or "browser," is an application which retrieves media content and renders it to users. A CMIL user agent may take on many forms. It may focus on three-dimensional visualization or audio presentation, concentrate on specific kinds of metadata, or be specifically designed to interface with assistive technologies. It may be built for the desktop or mobile devices; regardless of the form, the primary duties of a CMIL user agent are…

The prototype CMIL user agent, Shakti, is described in the following section. It adheres to the criterion described above, although limited to a primitive level in some cases. Nevertheless, Shakti serves as a functional prototype user agent for exploring the dynamics of metaspace and user interaction with vivagraphs.

3.3.1 Shakti

The user agent for the Vivagraph system renders CMIL documents, presenting users with scenes and their constituent vivagraphs, and allowing interaction with these media. The user agent described here is called Shakti after the Hindu goddess representing feminine power and abstractly, creative energy and potentiality. Shakti, the user agent, is a proof-of-concept prototype; at this stage only a subset of the CMIL specification is supported, just that necessary to render metascenes and present vivagraphs to users. Although vivagraph presentation and most hyperlinking behavior is supported, attribute inheritance, content selection and other more complex elements present in the CMIL specification are not at this stage.

Shakti is written in Macromedia Director, an integrated development environment (IDE) primarily used for authoring multimedia CD-ROMs and Shockwave interactive media for the World WideWeb. Author familiarity with Director at the outset of this project was the primary reason for its selection as the development platform. The user agent consists of roughly 5500 lines of Lingo code and numerous media resources embedded in the Director executable file. Shakti runs on MacOS and Windows operating systems.

3.3.1.1 User Agent User Interface

The Shakti user interface is primitive, but functional. It draws conceptually from elements of World Wide Web browsers, VRML world navigation, GIS applications and information visualization tools. Within the context of a CMIL user agent such as Shakti, there are two kinds of user interface to consider, one, the interface provided by the application itself and two, that constructed by authors of the media displayed with in it. The application user interface is the concern of this section; the next deals with content.

As mentioned earlier, Shakti is a freestanding executable application written in Director. Unlike writing desktop applications in C++, Java or Visual/Real Basic, etc., Director applications cannot utilize the common set of interface widgets provided by the underlying operating system. In other words, Director applications have no access to interface elements that provide the overall consistent "look and feel" of the platform, be it MacOS, Windows, a Unix variant, Linux, etc. There are tradeoffs here; while this is often desirable from the standpoint of developing customized multimedia CD-ROMs and interactive presentations, this also means that any required functionality provided by native interface widgets must be replicated in Director although it will likely never quite match the appearance and expected patterns of interaction expected from such elements in native applications. On the one hand it is good practice to be mindful of the extent to which users have become accustomed to the behavior of native interfaces and indeed application developers are admonished to follow guidelines to that effect[36, 37]. But, on the other hand, such stricture can also constrain the exploration of new patterns of interaction.

Figure 3.10 Shakti user agent user interface

Figure 3.10 Shakti user agent user interface

The user interface of the Shakti application is divided horizontally with elements residing primarily at the top and bottom of the viewable screen area. At the top left of the screen are buttons to open scenes and request help, accompanied by an area for tabbed listings of open scenes along the left-hand side. The open button opens a pop-up dialog that allows users to select CMIL documents from a local drive or a URL which the application loads and renders. The help button opens a CMIL document accompanying the Shakti application itself which provides information about how to use the interface. When a scene is loaded, a tab is placed along the uppermost left side of the interface to represent that scene. As its vivagraphs load, their icons are given the same unique color as the tab representing the scene. As additional scenes are rendered they are also added to the tab cue and may be removed from the user agent's display area (along with their constituent vivagraphs) by clicking on the tab.

In the upper right of the user interface is the name and current version number of the user agent. There is also a temporary reset button that purges the application of loaded documents, and an animated logo in the upper right corner. The animating logo serves as a status indicator of ongoing network activity as in the same fashion as those in WWW browsers.

In the bottom left of the interface are buttons, which allow users to change the lens view of the current set of vivagraphs. Currently location and time are the only available preset lens settings. They switch the lens view from a spatial map display to a timeline-based display. There is also a pull-down menu which allows users to select a map server from which to retrieve underlying map imagery from the map display.

The bottom center region of the user interface holds rudimentary navigation tools allowing users to zoom in and out of the present lens view as well as move vertically and laterally within the view. One may also return the display to its default viewable area using the reset button.

On the bottom right-hand side of the interface are the last two features. One is an object that hides or deletes vivagraphs from the currently rendered scene(s). The user drags the offending vivagraph onto the hide object and it disappears from the display area. The second object allows users to add scenes and/or vivagraphs they like to their own personal media archive (PMA) by dragging either the scene tab or vivagraph of interest onto the PMA button. This is analogous to bookmaking scenes, although individual vivagraphs can be "bookmarked" as well, through transclusion[38, 39] of the scene's content of interest.

3.3.1.2 Content User Interface

At this stage the ability of authors to create navigable interfaces for complex metascenes is limited both by the nature of CMIL and the current display limitations of the browser. However, even the present rudimentary browser support for hyperlinking allows the creation of interconnected scenes and thus a rough interface for a metascene can be constructed from hyperlinked vivagraph nodes.

Figure 3.11 Shakti renders a help document with hyperlinks (represented by blue triangles)

Figure 3.11 Shakti renders a help document with hyperlinks (represented by blue triangles).

For example, in Figure 3-11 above, the user agent's "help" scene has been opened. The scene contains five vivagraphs (represented by the orange circles), two of which are hyperlink anchors, indicated by the upward-pointing blue triangles immediately next to them. Hyperlinks allow users to navigate to new documents within metaspace and serve as a form of navigational interface. In addition to five vivagraphs, the scene presented in the figure below contains an optional background image with additional information. Eventual support for statically-positioned media, particularly images, will allow for the creation of more usable hyperlinked content navigation interfaces than are currently supported.

3.3.1.3 Interaction with Vivagraphs

In the previous two sections, I outlined the function of the Shakti user agent's user interface and some of the content interface elements in the scene author's palette. In this section I synthesize the two and discuss user interaction via Shakti with metaspace scenes and vivagraphs. The application user interface and scene content work together to provide the user experience of navigating and interacting with scenes and experiencing vivagraphs. This is not totally unlike "browsing" the World Wide Web. However, vivagraphs serve a different purpose. Vivagraphs place captured multimedia moments in context; thus the interaction paradigm for contextual media differs significantly in a number of areas from that of the World Wide Web.

3.3.1.3.1 Basic Navigation, Vivagraph Presentation and Lenses

Instead of a singular page-oriented view of hypertext on the World Wide Web, the organization of vivagraph media in metaspace is spatial, variable and driven largely by the interests of the user. Vivagraphs are rendered to the screen area and positioned spatially along axes corresponding to their contextual metadata and positioned by value. For example a loaded scene which contains vivagraphs with location, time and temperature metadata can be rendered according to theses metadata (or a combination thereof) by selecting from the corresponding lens settings. This allows the user to control not only the superficial presentation of content (available through use of stylesheets) but significantly transforms the information content conveyed by the display.

Vivagraphs arranged by location. Vivagraphs arranged by time.

Figure 3.12 Vivagraphs arranged by location(left) and time (right).

Figure 3-12 shows two views of the same metaspace scene containing vivagraphs captured from a recent trip to New Orleans. Although based on the very same CMIL document, the presentation is very different. In the former figure, the vivagraphs are plotted by latitude and longitude atop a dynamically generated map. In the latter, vivagraphs are arranged temporally by time of capture. Users click the loc or time buttons to alternate between these two available lens views.

Shakti allows multiple CMIL documents to be simultaneously rendered to the screen with vivagraphs from each document intermingling. The vivagraphs from each scene however differ in color and icon shape, so that the contents of each scene can be distinguished and related back to the appropriately colored scene tab identifier. The tabs themselves allow the user to interact with supersets of vivagraphs, at the level of the document. Although CMIL supports creating custom and even nested subsets of vivagraphs through the group element, Shakti does not yet support interacting with vivagraphs at this aggregate level.

Figure 3.13 Shakti renders 3 separate scene documents as indicated by the scenetabs.

Figure 3.13 Shakti renders 3 separate scene documents as indicated by the scenetabs.

Scene tabs allow users to unload a scene or hide scene content from the display area and serve to mitigate a bit of the complexity introduced by the simultaneous rendering of multiple documents. In the example figure above, three scenes have been loaded by the user agent, but only two of them are shown rendered to the screen area. The vivagraphs in the second scene has been hidden by de-selecting the visibility checkbox on its scene tab. Selecting the checkbox will reveal the contents and clicking on the tab itself will unload the document from the user agent's memory.

3.3.1.3.2 Hyperlinking

Hyperlinks allow resources to be associated with one another. Shakti supports simple and extended (multi-destination) forms of hyperlinking as defined by CMIL. Hyperlinks rendered in Shakti may link to one or more destination resource and are conceptually considered an attribute of a node element which represents a vivagraph.

Figure 3.14 Hyperlinks (blue and purple triangles)

Figure 3.14 Hyperlinks (blue and purple triangles)

Link anchors (the typically blue underlined characters in HTML) are currently represented in Shakti as blue upward and purple downward pointing triangles. Links are traversed by clicking the anchor or right- or shift-clicking the vivagraph itself. A blue triangle represents a link that hasn't been actuated by the user and traversed, a path not (yet) taken. Purple triangles indicate links that has already been traversed. In Figure 3-14, the traversed link at the bottom has opened two new scenes (which surround it). Thus far we are in somewhat analogous territory to the WWW, but Shakti allows traversed links to be "un-traversed." Clicking a traversed link indicator will close all of the destination documents specified by that link which are currently rendered to the screen. This allows the link anchor to take on some of the functionality provide by the forward and back buttons of WWW browsers. In addition, links may have different forms of actuation behavior. When a link is traversed from its source to its destination, authors may specify whether they'd like the destination resource to replace the source of the link, the entire source document to unload or simply load the destination resource(s) atop the existing rendered documents.

3.3.1.3.3 Rendering Vivagraph Media

Apart from how vivagraphs are presented in context vis a vis one another, there is the remaining question of how exactly to present the media constituents of individual vivagraphs. Vivagraphs are not restricted to any particular type of media or in the number of media objects they may contain. This variety makes the process of rendering the vivagraph to the user more of an art than a science. Shakti currently supports common image (JPEG, GIF, PNG), audio (AIFF, WAV, MP3) and plaintext media formats.

Two views of the same Vivagraph scene

Figure 3.15 Two views of the same Vivagraph scene, geophysical (left) and temporal (right), and a constituent Vivagraph rendered (audio, image and text component) to a popup window.

Vivagraphs can be experienced by clicking on the vivagraph icon or thumbnail within the display area. At this point the vivagraph is rendered to the whole screen of the user agent or presented in a pop-up window. Any image and text components are presented on screen and audio is played, set to cross-fade and loop so long as the vivagraph-presentation screen or popup is open.

3.3.1.4 User Agent Summary

Shakti is a proof-of-concept user agent built to render vivagraphs and serve as a test-bed for refinement of the Vivagraph concept and the CMIL language. It is built in Macromedia Director and renders vivagraphs hosted on the Internet or local drives. Shakti does not support the full specification of CMIL at this time, but does to a sufficient degree to render scenes and vivagraphs, allowing multi-scene navigation and changing of lens views.

3.4 Authoring Vivagraph Scenes

Before being able to interact with vivagraphs in the Shakti user agent, they must first be captured and created. The raw materials needed are digital media files and a text document written in CMIL specifying how they are to be arranged. The CMIL document describes where on the network to find these media, states the elements of context associated with them and specifies how they should be displayed by the user agent. During the development of this work the process of authoring vivagraphs has progressed from being entirely manual (out of necessity) to utilizing increasingly automated methods, eliminating much tedium in the process.

Both approaches are valuable, however automation places the onus on the programmer/designer to shield the user from unnecessary labor. Ultimately, authoring contextual media should require as little effort as experiencing it in the first place and should require little or no user intercession. However, for more complex scenes, manual authoring and more advanced tools will still be needed. The following sections describe the authoring experience using both the manual and more automated approaches utilized during the development of the system.

3.4.1 Early Authoring

The first set of Vivagraph scenes were created just as the browser became workable with the assistance of my brother, Rob Dietz. Both of us went on two occasions to the local farmers' market in Bloomington, Indiana. I captured images with a Nikon CoolPix digital camera, while he captured audio samples with a Sharp MiniDisk digital audio recorder. I marked the position of individual captures on an overhead topographical map as we strolled around the market. After completing the capture sessions, we compiled and optimized the media: Rob selected 10-15 second segments of audio from the digital soundtrack and I resized and compressed the images for display in the user agent. I then wrote the CMIL documents in a text editor using location data derived from the map, and time from the creation date of the image files captured by the camera. Along with these media, the CMIL documents provided the initial scenes used to test the browser as it evolved from its earliest incarnations.

3.4.2 Late Authoring - Toward Automation

The authoring process currently used is significantly less manually intensive. I use a Kodak DC290 Digital camera in tandem with a Garmin GPSIII+ global positioning unit, a serial cable to connect the two, a Motorola GPS antenna and modified aluminum mounting brackets. This apparatus I call the "Vivacam," short for vivagraph camera. Kodak offers a package called the FIS (Field Imaging System) which bundles its DC265 camera with the Garmin GPSIII+ and software for use with the Arcview professional GIS package. The FIS system offers similar functionality, but not the compact form-factor of the vivacam nor the custom software integral to this project. The Vivacam is shown below with its carrying strap.

Figure 3.16 Vivacam contextual media capture apparatus

Figure 3.16 Vivacam contextual media capture apparatus

The DC290 digital camera uses the Digita operating system, which allows one to write scripts and software that the camera can run from its removable memory card. I used a slightly modified freeware script written by IMAIZUMI Osamu available on the Digita Camera website6 to gather GPS data into the camera when a picture is captured. After the image is captured, I then change camera modes and asynchronously capture ambient audio with the camera, repeating the process for each vivagraph captured.

After completing a capture session I run a second custom Digita script, CMILDUMP, to extract the context data stored in each image into a text file. I then use a custom Java application, MakeCMIL, on the desktop to construct a well-formed CMIL document from this text file. I use Java at this stage because of the weakness of Digita Script for the string manipulation needed to generate well-formed CMIL from the raw context data. I then extract the sound data from the images and optimize all the media (dimensions and compression) for display in the browser.

Although still a somewhat complicated process, it is remarkably more efficient than the entirely manual alternative. Using the Vivacam and associated programs, I have successfully captured vivagraph scenes of Bloomington, San Francisco, New Orleans, Amsterdam, Brussels, Paris and London over the course of the past year.

3.5 Vivagraph Media Projects

Over the course of this work several projects were undertaken in order to provide material to test the overall system, explore alternative uses and to a certain extent provide diversion. The following sections describe a number of these projects.

3.5.1 Farmers' Market: Community Documentary

The very first capture session was performed at the Bloomington Community Farmers' Market. Subsequent to that first session, I have continued to capture vivagraphs there at times and in the surrounding Bloomington community throughout the course of this project. The Farmers' Market vivagraphs were created manually at the outset and automatically as the project evolved.

The collection of media captured at the market over the course of the last year serves as a growing contextual documentary of this community event. The Farmers' Market is open on Saturdays during the summer and draws broadly across the spectrum of the community, bringing together musicians, merchants and passers-by. The vivagraphs captured there create a multimedia documentary of a characteristic place and time in the life of the community.

3.5.2 Europa: Travelogue

A recent trip to the WWW9[29] conference in Europe provided an opportunity to capture the sights and sounds of several of her major cities in vivagraphs. It also provided an opportunity to document over the course of nearly 3 weeks, the challenges and limitations of the Vivagraph model, in particular, those of the capture device. A few observations follow:

I captured vivagraphs in Amsterdam, Brussels, Paris and London. Despite the inconveniences itemized above, capturing vivagraphs during the trip was not difficult with a bit of patience and again, plenty of batteries.

3.5.3 Soundscape

This effort was undertaken to further explore the element of sound in vivagraph media as well as server-side generation of CMIL documents with PHP (Hypertext Preprocessor), a scripting language used for generating dynamic website content. Rob Dietz, my brother and a skilled electronic composer, assembled a collection of digital audio samples and imagery which we then compiled into a rudimentary dynamic metascene. The scenes themselves are dynamically generated by PHP on the server. The links within each scene point to another PHP-generated dynamic scene to be rendered in Shakti, creating a new audio-visual environment.

3.5.4 Project SOHO: Vivagraph Narrative

Project SOHO is a collaboration between myself and artist/author, Susan Brackney. The goal here is to create a narrative vivagraph environment, utilizing the elements of space and time as an integral part of the narrative. Work has already begun on the back story, the character sketches and composition of the story environment. Although incomplete at this time, the use of the Vivagraph system for narrative, even interactive, storytelling has great appeal and will be a continuing element of this work.

4 Conclusions and Future Work

"A hundredth of a second here, a hundredth of a second there—even if you put them end to end, they still only add up to one, two, perhaps three seconds, snatched from eternity." Robert Doisneau

4.1 Challenges

Vivagraphs provide a way to enhance our personal media with a rich sense of context and to share this media easily using the Internet. User experience with the browser has been encouraging and suggests that the Internet is likely to be a welcoming place for contextual multimedia applications such as Vivagraphs. The increasing ease-of-use and the quality of captured experience Vivagraphs that allow has made the development of this system satisfying, stimulating and enjoyable. However, there is much work to be done to improve and expand many areas of the project. A brief discussion of some of these elements follows.

4.1.1 Bandwidth

One challenge users face using Vivagraphs is that the size of vivagraphic media can greatly exceed that of the average Web page. The included image, sound and video files make many vivagraph scenes too large to download acceptably over most modem connections without compressing the media to unacceptably poor levels of quality.

I typically capture one scene per city I've recently visited and average 35 vivagraphs per scene. Each vivagraph (including an image, sound file and text) is approximately 35KB, making the total file size of the average scene, 1.2MB. Downloading this average scene takes about five minutes and 41 seconds over a 28Kbps (baud) modem. Even over faster modem connections up to 56Kbps, the download time is excessive. One mitigating element is that Shakti renders vivagraphs from each scene progressively, displaying each vivagraph only after all of its media are downloaded. This allows users to interact with early loading vivagraphs without waiting for the entire content of the scene to download before rendering the scene. Because of the amount of media specified in the typical scene, vivagraphs are much more suited to a broadband, high bandwidth, environment than the typical modem connection.

4.1.2 CMIL

CMIL provides a structured way for authors, be they human or computer programs, to create vivagraphs. There are a number of elements yet to be addressed in CMIL, some of which revolve around the continual evolution of XML and its many companion technologies. Others reflect the expansion of the language in response to the demand for additional functionality. The following sections describe a number of anticipated changes and additions.

4.1.2.1 Schema

XML Schema [35, 40] is a relatively new way to describe the syntax and semantics of XML documents. XML Schema are essentially a replacement for DTD's, but differ in that Schema are themselves constructed in XML. Through schema XML authors establish the rules for their languages where the rules themselves are XML documents7. The rules for CMIL have been reformulated as an XML Schema in accordance with the W3C Schema specification, but, currently, DTD's are still the more commonly supported method for describing XML language structure.

4.1.2.2 Stylesheets

XSL[44] is eXtensible Style Language. It is a W3C standard for expressing stylesheets in XML. One of the intrinsic features of XML is that it separates information content from presentation semantics. Elements should not contain information telling user agents how they are to be presented, as that is the purview of associated stylesheets. This separation allows the presentation to be controlled independently of the content, making the same information accessible across numerous viewing modalities and disallows the pollution of information with formatting syntax. Currently the rules governing the presentation of CMIL rest within the CMIL user agent. Ultimately CMIL user agents will use companion XSL stylesheets to describe the presentation semantics of vivagraphs in user agents.

4.1.2.3 Namespaces

Namespaces[41] allow authors to embed elements from other XML languages within an XML document. In the case of CMIL, the ability to embed HTML, SMIL or SVG (Standard Vector Graphics) fragments will prove useful.

4.1.2.4 Hyperlinking

XML offers a more powerful syntax for hyperlinking that is currently described in the CMIL specification. Migration to the Xlink[42] and XPointer syntax as a replacement for the current hyperlinking syntax is likely. In addition some mechanism in support of multidirectional linking and annotation for vivagraph scenes should be explored.

4.1.2.5 Streams

In addition to CMIL nodes, there is a need for more robust support in CMIL for temporal media such as video. With such media, the attribute values of each contextual metadata element may change over the course of the media's playing span. CMIL should provide the means to describe such changes over time so that linear contextual media such as video can be integrated into the Vivagraph framework.

4.1.3 User Agent

There are several areas in which the user agent needs improvement, the primary element being that it should be implemented in a more robust form in either C++ or Java. The other problem areas are in CMIL support and improvement in user interface.

4.1.3.1 CMIL Support

The user agent should fully support the CMIL specification and all relevant and associated standards promulgated by the W3C. This has already proven something of a challenge due to changes in XML standards as they have evolved over the course of this project and to new developments within CMIL itself. The user agent should also support XSL and user selected stylesheets.

4.1.3.2 User Interface

The accessibility and usability of the Shakti user interface are concerns which haven't been dealt with sufficiently at this time. Shakti is based largely on design idioms informally derived from World Wide Web Browsers, GIS suites and information visualization applications. The efficacy of this synthesis needs greater attention. In addition, the idioms employed for hyperlinking and multiple document navigation are exploratory and untested.

There are several interface improvements on the horizon, some of which follow:

4.1.4 Authoring and Media Transfer from DMCD

The main development forthcoming in the authoring area are consolidation and simplification of the scripts and programs currently used to move from capture to browser-ready CMIL file and media. Providing a desktop authoring interface is also planned as well as providing simple editing facilities for captured media.

4.1.5 DMCD

The inaccuracy and intermittent resolution of GPS signals are significant hurdles to making vivagraph capture as transparent as is necessary. Likewise asynchronous image and sound capture is tedious and inhibits use as well as violates the "capture the moment" aesthetic vision for vivagraphic media.

4.2 Future of Contextual Media and Vivagraphs

This thesis presents a particular vision for the evolution of digital media, a vision which posits that media can and will become much more than what it is today and that it will do so in part by setting its own stage in the mind. This, media will accomplish by communicating a sense of their own contextual environment and meaning whether it be factual or fictional. No longer inert and divested of contextual grounding, these new media—contextual media—lay the foundation for vivagraphs.

4.2.1 0101011001101001011101100110000100100001

Thebinary number which serves as the heading for this last section is to most, including myself, inscrutable, alien, and seemingly random. But what does it mean, if anything, this string of zeroes and ones? If you have your favorite secret decoder ring with you, this whole five byte number can be translated into the ASCII text character codes for "V", "i", "v", "a" and "!". When further translated from Spanish to English (if needed), "Viva!" is a celebratory expression of good will, the admonition of "long life!" But as to the meaning of "life," well, I can only recommend the movie.

It is in this spirit that vivagraphs add further layers of meaning to digital media. The purpose of Vivagraphs is to make the media which so much impact our lives more meaningful, not just to ourselves, but to others we may not ever know. Just as a rendered image file communicates much more than the base 2 number it in actuality is, Vivagraphs make the leap from common media file to meaningful communicative media. The communication of meaning with its requisite contextual accompaniments makes a more intimate connection with viewers possible. By making the extrinsic—the contextual—intrinsic to captured digital media we have stated that more than transmission, understanding and empathy are our goals. No longer inert and divested of context, these media reach out with a sense of their own significance and share it freely; and at each additive step we further transform these media. As the message and the meaning become all the more clear, these media may become more a part of our lives and more a reflection of ourselves.

Footnotes

1 MP3's are digital audio files often accompanied by an ".mp3" suffix. MP3 is an MPEG-1 Layer 3 audio format which allows digital audio files to be compressed to relatively small sizes but retains high fidelity compared to the original source file. Because of the small file size (around 1/10th of the uncompressed source) and their relatively high quality sound, MP3's have flourished as an audio file format for the exchange of music over the Internet.

2 The veracity of the inference is another story entirely.

3 Support for multimedia, contextual media and hypermedia is the only formal requirement of vivagraphic media. The personal media aspect is largely an aesthetic consideration, but one considered important enough to codify here.

4 http://www.cs.indiana.edu/~rawlins/website/extras/people.html

5 Although possible to take for granted, it cannot be. The state of the Internet as a forum for free expression is constantly under fire in some form in jurisdictions across the globe, from Iran to China to the United States.

6 http://www.digitacamera.com

7 This leads to the circular and somewhat perplexing observation that schema themselves, as XML documents, may in turn have schema of their very own.

5 Glossary

Attribute
Attributes provide a way to annotate elements with additional information. An attribute is an element parameter specified in an associated DTD which takes the form: name="value". In the following example, href is an attribute of the a element: <a href="http://www.genericorp.com">hyperlink</a>
CMIL
Contextual Media Integration Language is an XML-derived structured markup language for creating documents which specify how digital media and contextual metadata are combined to form vivagraphs.
DTD
A document type definition defines the overall logical structure of a language. It consists of XML declarations which specify the elements and attributes legal for documents complying with the DTD.
Document
a document is a data stream which, in conjunction with any referenced streams such as external stylesheets or scripts, contains elements conforming to rules specified by an associated DTD.
Element
Elements are the building blocks of documents. Elements give structure to information by following rules specified in an associated DTD. For example, head, title, body and p are a few elements available to authors of HTML, which are nested to create Web pages.
Home Scene
A home scene is a scene which serves as the root document, a table of contents or navigational locus for metascene content.
Metascene
A metascene is an interconnected group of scenes usually hosted on the same server and typically entered via the metascene's home scene.
Metaspace
Metaspace is the Internet application supporting vivagraphs. It may also be thought of the aggregation of all vivagraphic media accessible through the Internet. A metaspace is a multidimensional space arising from a collection of digital media objects (media, files, folders, etc) which have associated with them various forms of contextual and generic metadata. Plotting these objects spatially along axes corresponding to each object's metadata attributes carves out the object's position in this multidimensional metadata space.
Render
Documents are rendered (presented to the user in some fashion) by a user agent.
Scene
Any rendered CMIL document is a scene.
Tag
A tag is an instance of an element in a document. All tags must be closed or self-terminated like so: <tag>stuff between tags</tag> or <tag/>.
User Agent
A user agent retrieves, processes and renders documents. User agents are often referred to as clients or informally as "browsers."
Valid
A document is valid if it is well-formed and has been verified to conform to all of the rules specified by an associated DTD.
Vivagraph
1) A vivagraph is a constituent of metaspace; it is an aggregation of digital media and contextual metadata specified in a CMIL document and rendered in a metaspace scene. 2) The system as a whole.
Well-formed
A document is well-formed if it conforms to proper XML syntax, i.e., the elements are nested properly, terminated and attribute values are quote enclosed, etc.

6 Bibliography

[1] N. Negroponte, Being Digital, 1st ed. New York: Knopf, 1995.

[2] J. M. Jacobson, B. Comiskey, P. Anderson, and L. Hasan, "Electronic Paper," MIT Media Lab Micromedia Group, 2000. Last Accessed on April 1, 2000. Available at http://www.media.mit.edu/micromedia/elecpaper.html

[3] Xerox, "Electronic Paper," Xerox Palo Alto Research Center, 2000. Last Accessed on April 1, 2000. Available at http://www.parc.xerox.com/dhl/projects/epaper/

[4] Cieva Logic, "Cieva Homepage," Cieva Logic, LLC, 2000. Last Accessed on April 1, 2000. Available at http://www.ceiva.com/

[5] Sony Digital Imaging, "CyberFrame Models," Sony Digital Imaging, 2000. Last Accessed on April 1, 2000. Available at http://www.sel.sony.com/SEL/consumer/dimaging/browse_the_products/cyberframe_viewer/index.html

[6] J. C. Spohrer, "Information in Places," IBM Systems Journal, vol. 38, pp. 602-628, 1999.

[7] S. Mann, "Wearable, Tetherless Computer-Mediated Reality: Wearcam as a Wearable Face-Recognizer, and Other Applications for the Disabled.," presented at AAAI Fall Symposium on Developing Assistive Technology for People with Disabilities, Cambridge, MA, 1996.

[8] R. Barthes, Camera Lucida : Reflections on Photography, 1st American ed. New York: Hill and Wang, 1981.

[9] H. Cartier-Bresson, The Decisive Moment. New York: Simon and Schuster, 1952.

[10] Merriam-Webster Inc., Webster's Tenth New Collegiate Dictionary. Springfield, MA: Merriam-Webster, 1993.

[11] D. Mercer, "Personal Digital Devices," Strategy Analytics Research, Industry Forecase 2000. Available at

[12] D. Frohlich and E. Tallyn, "Audiophotography: Practice and Prospects," presented at CHI99 Human Factors in Computing Systems, Pittsburg, PA, 1999.

[13] N. G. JPO, "NAVSTAR GPS Joint Program Office (SMC/CZ) Homepage," 2000. Last Accessed on May 2, 2000. Available at http://gps.laafb.af.mil/

[14] W. J. Clinton, "Statement by the President Regarding the United States' Decision to Stop Degrading Global Positioning System Accuracy," Office of the Press Secretary, Washington, D.C. 2000. Available at http://www.whitehouse.gov/library/PressReleases.cgi?date=1&briefing=0

[15] Knownspace Group, "Knownspace Project Homepage," Knownspace Group, 1999. Last Accessed on May 2, 2000. Available at http://hydrogen.knownspace.org/

[16] V. Bush, "As We May Think," in Atlantic Monthly, vol. 176, 1945, pp. 101-108.

[17] T. H. Nelson, "The Hypertext," presented at World Documentation Federation, 1965.

[18] J. Abbate, Inventing the Internet. Cambridge, Mass: MIT Press, 1999.

[19] T. Berners-Lee, "Information Management: A Proposal," CERN, 1989. Last Accessed on April 1, 2000. Available at http://www.w3.org/History/1989/proposal.html

[20] I. Napster, "Napster Homepage," 2000. Last Accessed on July, 2000. Available at http://www.napster.com

[21] J. Frankel and T. Pepper, "Gnutella," 2000. Last Accessed on July 2000. Available at http://gnutella.wego.com/

[22] I. Clarke, "A Distributed Decentralized Information Storage and Retrieval System," in Division of Informatics. Edinburgh, Scotland: Edinburgh University, 1999.

[23] P. Hoschka, "Synchronized Multimedia Integration Language (SMIL) 1.0 Specification," World Wide Web Consortium, Recommendation 1998. Available at http://www.w3.org/TR/REC-smil/

[24] J. C. Spohrer, "Worldboard: What Comes After the World Wide Web?," Apple Learning Communities Group, ATG, 1996. Last Accessed on April 1, 2000. Available at http://worldboard.org/pub/spohrer/wbconcept/default.html

[25] H. D. Wactlar, M. G. Christel, A. G. Hauptmann, and Y. Gong, "Informedia Experience-on-Demand," Informedia: Digital Video Library Research Group, 2000. Last Accessed on April 1, 2000. Available at http://www.informedia.cs.cmu.edu/eod/

[26] S. Mann, "`Eudaemonic Eye': 'Personal Imaging' and wearable computing as result of deconstructing HCI; towards greater creativity and self-determination.," presented at CHI97 Conference on Human Factors in Computing, Atlanta, GA, 1997.

[27] T. Berners-Lee, R. T. Fielding, and H. F. Nielsen, "Hypertext Transfer Protocol — HTTP/1.0," World Wide Web Consortium, 1996. Last Accessed on May 6, 2000. Available at http://info.internet.isi.edu:80/in-notes/rfc/files/rfc1945.txt

[28] R. B. Dietz, "CMIL 0.9 Specification," Oacea 2000. Available at http://www.oacea.com/cmil/cmil_0.9_20000310.html

[29] R. B. Dietz, "CMIL and Metaspace: Visualizing Hypermedia with Contextual Metadata," presented at 9th International World Wide Web Conference, Amsterdam, Netherlands, 2000.

[30] T. Bray, J. Paoli, and C. M. Sperberg-McQueen, "Extensible Markup Language (XML) 1.0," World Wide Web Consortium, Recommendation 1998. Available at http://www.w3.org/TR/REC-xml

[31] W3C, "World Wide Web Consortium Homepage," World Wide Web Consortium, 1994. Last Accessed on May 6, 2000. Available at http://www.w3.org/

[32] S. St. Laurent, XML : A Primer, 2nd ed. Foster City, CA: M&T Books, 1999.

[33] International Organization for Standardization, "ISO 6709:1983. Standard Representation of Latitude, Longitude and Altitude for Geographic Point Locations," ISO (International Organization for Standardization), Geneva 1983. Available at http://www.iso.ch/cate/d13152.html

[34] International Organization for Standardization, "ISO 8601:1988. Data Elements and Interchange Formats — Information Interchange — Representations of Dates and Times," ISO (International Organization for Standardization), Geneva 1988. Available at http://www.iso.ch/cate/d15903.html

[35] P. V. Biron and A. Malhotra, "http://www.w3.org/TR/xmlschema-2/," World Wide Web Consortium (W3C) 2000. Available at http://www.w3.org/TR/xmlschema-2/

[36] Apple Computer, Macintosh Human Interface Guidelines. Reading, MA: Addison-Wesley, 1993.

[37] Microsoft Corporation, The Windows Interface: An Application Design Guide. Redmond, WA: Microsoft Press, 1992.

[38] T. H. Nelson, "The Heart of Connection: Hypermedia Unified by Transclusion," Communications of the ACM, vol. 38, pp. 31-33, 1995.

[39] Jonathan Marsh and D. Orchard, "XML Inclusions (XInclude)," W3C 2000. Available at http://www.w3.org/TR/xinclude

[40] H. S. Thompson, D. Beech, M. Maloney, and N. Mendelsohn, "XML Schema Part 1: Structures," World Wide Web Consortium (W3C) 2000. Available at http://www.w3.org/TR/xmlschema-1

[41] T. Bray, D. Hollander, and A. Layman, "Namespaces in XML," World Wide Web Consortium (W3C) 1999. Available at http://www.w3.org/TR/1999/REC-xml-names-19990114/

[42] S. DeRose, E. Maler, D. Orchard, and B. Trafford, "XML Linking Language (XLink)," World Wide Web Consortium (W3C) 2000. Available at http://www.w3.org/TR/xlink/

[43] J. Gunderson and I. Jacobs, "User Agent Accessibility Guidelines 1.0," World Wide Web Consortium 2000. Available at http://www.w3.org/TR/UAAG/

[44] J.-D. Fekete and C. Plaisant, "Excentric Labeling: Dynamic Neighborhood Labeling for Data Visualization," presented at Conference on Human Factors in Computer Systems (CHI'99), Pittsburg, PA, 1999.

[45] G. Furnas and B. Bederson, "Space-Scale Diagrams: Understanding Multiscale Interfaces," presented at 1995 Conference on Human Factors in Computing Systems, Denver, 1995.

7 Appendices

7.1 CMIL 0.9 DTD

<!-- 
	=========== CMIL DTD =============================================

	This is CMIL 0.9 DTD for Contextual Media Integration Language
	Draft: Date 2000/02/29 
	
	Author: Rick Dietz <rick@oacea.com>

	This DTD is available at the system identifier:

		http://www.oacea.com/cmil/cmil-0.9.dtd
     
	Further Information about CMIL can be found at:
     
		http://www.oacea.com/cmil/
     
	=========== CMIL DTD =============================================
-->
				
<!--=========== Misc Entities ========================================-->

<!ENTITY % id-attribute  
	"id  		ID    		#IMPLIED"
	>

<!ENTITY % basic-attributes  
	"%id-attribute;
	title  		CDATA    	#IMPLIED 
	desc  		CDATA    	#IMPLIED"
	>

<!ENTITY % style-attributes  
	"class		CDATA		#IMPLIED
  	style       CDATA		#IMPLIED"
	>

<!ENTITY % core-attributes  
	"%basic-attributes;
	%style-attributes;
	abstract	CDATA		#IMPLIED		
  	creator   	CDATA		#IMPLIED
  	rights      CDATA		#IMPLIED"
  	>

<!--Applications of CMIL incorporating location data should specify
	the particular datum used for subsequent location data provided 
	in the document.-->
<!ENTITY % datum-attribute  
	"datum      CDATA		#IMPLIED" >

<!--The timezone is specified as a numerical offset from the 
	Greenwich Mean.  For example, Eastern Standard Time is "-0500".-->
<!ENTITY % timezone-attribute  
	"timezone 	CDATA		#IMPLIED" >

<!ENTITY % shape-attribute  
	"shape (cone|cylinder|sphere) 'sphere'" >

<!-- ========== Hyperlinking =========================================-->
<!ENTITY % link-attributes  
	"%id-attribute;
	title		CDATA			#IMPLIED
	href		CDATA			#REQUIRED
	actuate		(auto|user)		'user'			
	show		(present|replace|transform) 'present'"
	>

<!ENTITY % anchor-attributes  
	"begin  	CDATA    		#IMPLIED 
	end  		CDATA    		#IMPLIED 
	coords  	CDATA    		#IMPLIED"
	>

<!--=========== Selection Parameter Attributes ==========================-->
<!ENTITY % selection-parameters
	"system-bitrate				CDATA				#IMPLIED
	system-gps-error			CDATA				#IMPLIED
	system-language				CDATA				#IMPLIED
	system-mode  				CDATA				#IMPLIED     
	system-required				NMTOKEN				#IMPLIED
	system-screen-size			CDATA				#IMPLIED
	system-screen-depth			CDATA				#IMPLIED
	system-captions				(on|off)			#IMPLIED
	system-overdub-or-caption	(caption|overdub)	#IMPLIED"
	>

<!--=========== MEDIA Attributes =====================================-->
<!ENTITY % media-attributes  
	"region		IDREF		#IMPLIED
	alt			CDATA		#IMPLIED
	src			CDATA		#IMPLIED
	type		CDATA		#IMPLIED
	dur			CDATA 		#IMPLIED
	repeat		CDATA		'1'
	%core-attributes;
	%selection-parameters;" 
	>

<!ENTITY % container-elements  "(group | node | a | sift | switch)" >

<!ENTITY % media-elements  "(animation | audio | image | model | 
	ref | speech | text | textstream | vector | video | www)" >


<!--=========== CMIL Element =========================================-->
<!ELEMENT cmil	(head,body) >
<!ATTLIST cmil
	%id-attribute;
    >
				
<!--=========== HEAD Element =========================================-->
<!ENTITY % head.misc "(script|noscript|style|meta|link|prox|attributes)*">

<!-- The content model of the head element is %head.misc; with a 
	 title and an optional base element in any order -->

<!ELEMENT head (%head.misc;,
     ((title, %head.misc;, (base, %head.misc;)?) |
      (base, %head.misc;, (title, %head.misc;))))>

<!ATTLIST head
	%id-attribute;
    >

<!--=========== BODY Element =========================================-->
<!ELEMENT body  (%container-elements; | attributes)* >

<!ATTLIST body
	%id-attribute;
	onload      CDATA   	#IMPLIED
    onunload 	CDATA   	#IMPLIED
	bgcolor		CDATA		#IMPLIED 
	mgcolor		CDATA		#IMPLIED 
	fgcolor		CDATA		#IMPLIED 
	bgimage		CDATA		#IMPLIED 
	nwcoord		CDATA		#IMPLIED 
	secoord		CDATA		#IMPLIED
	bgaudio		CDATA		#IMPLIED
	>

<!--=========== TITLE Element ========================================-->
<!ELEMENT title	(#PCDATA) >

<!--=========== LINK Element =========================================
	Link is used primarily to connect to external scripts and
	style documents-->
<!ELEMENT link   EMPTY  >

<!ATTLIST link
	%core-attributes;
 	charset		CDATA		#IMPLIED 
	href		CDATA		#IMPLIED 
	hreflang	CDATA		#IMPLIED 
	type		CDATA		#IMPLIED 
	rel			CDATA		#IMPLIED 
	rev			CDATA		#IMPLIED 
	media		CDATA		#IMPLIED
    >

<!--=========== META Element =========================================
	Meta is a property-value pair for specifying metainformation such
	as "keywords", "description" and "creator"-->
<!ELEMENT meta   EMPTY  >

<!ATTLIST meta
	name		CDATA		#REQUIRED 
	content		CDATA		#REQUIRED
    >

<!--=========== STYLE Element ========================================
	XSL Style information. May include CDATA sections-->
<!ELEMENT style  (#PCDATA) >

<!ATTLIST style
	type  		CDATA    	#REQUIRED 
	media  		CDATA    	#IMPLIED 
	title  		CDATA    	#IMPLIED
	>

<!--=========== SCRIPT Element =======================================
	Scripting language container. May include CDATA sections-->
<!ELEMENT script  (#PCDATA) >

<!ATTLIST script
	charset  	CDATA    	#IMPLIED 
	type  		CDATA    	#REQUIRED 
	src  		CDATA    	#IMPLIED 
	defer 		(defer)   	#IMPLIED
	>

<!--=========== NOSCRIPT Element =====================================-->
<!ELEMENT noscript  ANY >

<!ATTLIST noscript
	%id-attribute;
	>


<!--=========== BASE Element =========================================
	The base can as an option specify the root address for relative 
	URI's.  In like fashion it can optionally specify base values for 
	relative location, time and orientation attributes throughout the 
	document.  Due to the number of attributes needed for base to 
	serve in this capacity, child elements may also be used instead of
	analogous attributes.  This duplication needs to be resolved as 
	soon as possible.-->
<!ELEMENT base  (att | a | loc | orient | time)* >

<!ATTLIST base
	href  		CDATA    	#IMPLIED 
	loc  		CDATA    	#IMPLIED 
	time 		CDATA    	#IMPLIED 
	orient 		CDATA    	#IMPLIED 
	height  	CDATA    	#IMPLIED 
	radius  	CDATA    	#IMPLIED 
	%datum-attribute;
	%timezone-attribute;
	%shape-attribute;
	>

<!--=========== PROX Element =========================================
	The prox (proximity) element is a container for tag elements which 
	associate coded physical objects with an internet document via 
	URI.-->
<!ELEMENT prox  (tag | sift | switch)* >

<!ATTLIST prox
	%id-attribute;
	>
		
<!--The Tag element is essentially a hypertext anchor. It requires
	an id and href attribute be valid. The id of the Tag element
	corresponds to the value of an external tag. It may contain any 
	number of Locator elements allowing some forms of extended 
	linking. It is otherwise empty.-->
<!ELEMENT tag  (locator | sift | switch)* >

<!ATTLIST tag
	%link-attributes;
	%selection-parameters;
	>
	
<!--=========== LOCATOR Element ======================================
	The Locator element provides tag and a elements a means of 
	support for extended linking.-->
<!ELEMENT locator   EMPTY  >

<!ATTLIST locator
	%link-attributes;
	role  	CDATA    	#IMPLIED
	%style-attributes;
	%anchor-attributes;
	%selection-parameters;
    >	
				
<!--=========== ATTRIBUTES Element ===================================
	The attributes element is a container for the individual attribute 
	elements of a CMIL node when found within the body.  When present 
	in the head of a CMIL document, the attributes element allows
	authors to define document level attributes such as channel and 
	coverageas well as define the parameters of attributes to be used 
	within nodes in the body; one may set the units, title, and 
	description of an author-defined attribute using this approach.-->
<!ELEMENT attributes  
	(att | a | channel | coverage | loc | orient | time)* >

<!--Any number of att elements may exist at the same level, providing 
	each is unique.-->
<!ATTLIST attributes
	%basic-attributes;
	>

<!--=========== ATT Element ==========================================
	The att element consists of a property-value pair allowing for 
	user-specified attributes beyond the few specified here such as 
	loc, orient, etc.  Dynamic attributes need support here through
	an href or reference to an id stamped piece of code.  -->
<!ELEMENT att   EMPTY  >

<!ATTLIST att
	name  		CDATA    	#REQUIRED 
	content  	CDATA    	#REQUIRED
	metric		CDATA		#IMPLIED
	src			CDATA		#IMPLIED
	%basic-attributes;
	>

<!--=========== A Element ============================================
	The a element allows for inline linking behavior.  Extended links
	will contain a number of locator elements as child elements.  An
	a element may exist as an attribute of a node or encapsulate 
	other container elements.-->
<!ELEMENT a  (locator | sift | switch | group | node)* >

<!ATTLIST a
    %link-attributes;
    %style-attributes;
    %selection-parameters;
    >

<!--=========== CHANNEL Element ======================================
	Content description.-->
<!ELEMENT channel   EMPTY  >

<!ATTLIST channel
	name  		CDATA    	#IMPLIED
	%basic-attributes;
	>

<!--=========== COVERAGE Element =====================================
	Space where the given document is valid for display by augmented 
	reality applications....-->
<!ELEMENT coverage   EMPTY  >

<!ATTLIST coverage
	%shape-attribute;
	radius  	CDATA    	#IMPLIED 
	height  	CDATA    	#IMPLIED
	>

<!--=========== LOC Element ==========================================-->
<!ELEMENT loc   EMPTY  >

<!ATTLIST loc
	%datum-attribute;
    coords  	CDATA    	#REQUIRED 
	mode 	(reltobase | reltoviewer | absolute)	"absolute"
	%basic-attributes;
	>

<!--=========== ORIENT Element =======================================-->
<!ELEMENT orient   EMPTY  >

<!ATTLIST orient
	bearing 	(n | nne | ne | ene | e | ese | se | sse | s | ssw | 
		sw | wsw | w | wnw | nw | nnw | CDATA)   #REQUIRED
	incline		CDATA		#IMPLIED
	mode 	(reltobase | reltoviewer | absolute)  	"absolute"
    %basic-attributes;
    >

<!--=========== TIME Element =========================================-->
<!ELEMENT time   EMPTY  >

<!ATTLIST time
	time  		CDATA    	#IMPLIED 
	begin  		CDATA    	#IMPLIED 
	end  		CDATA    	#IMPLIED
    %timezone-attribute;
	%basic-attributes;
	>
	
<!--=========== SIFT Element =======================================
	The purpose of the sift element is to allow for non-mutually 
	exclusive content selection among media or container elements 
	in CMIL documents based on system settings or user preferences.-->
<!ELEMENT sift  (group | node | a | sift | switch | locator | tag | attributes)* >

<!ATTLIST sift
     %basic-attributes;
     %selection-parameters;
     >
			
<!--=========== SWITCH Element =======================================
	The purpose of the switch element is to allow for content 
	negociation between elements with targeted media.  The switch 
	syntax is borrowed largely intact from the W3C's Synchronized
	Multimedia Integration Language 1.0 recommendation.-->
<!ELEMENT switch  (group | node | a | sift | switch | locator | tag | attributes)* >

<!ATTLIST switch
     %basic-attributes;
     %selection-parameters;
     >
				
<!--=========== GROUP Element ========================================
	The group element is a generic container element.-->
<!ELEMENT group  (%container-elements; | attributes)* >

<!ATTLIST group
	%core-attributes;
	%selection-parameters;
    >
				
<!--========== NODE Element ==========================================
	The node element is the kernel of contextual media, uniting 
	digital media and attributes.-->
<!ELEMENT node  ((attributes,media) | (media,attributes)) >

<!ATTLIST node           
	%core-attributes;
	%selection-parameters;
	>	

<!--=========== MEDIA Element ========================================
	The media element is a container for media-elements which follow.-->
<!ELEMENT media  (%media-elements;)* >

<!ATTLIST media
	%basic-attributes;
	>

<!ELEMENT animation  (anchor?) >
<!ATTLIST animation
	%media-attributes;
	>

<!ELEMENT audio  (anchor?) >
<!ATTLIST audio
	%media-attributes;
	>
	
<!ELEMENT image  (anchor?) >
<!ATTLIST image
	%media-attributes;
	>
	
<!ELEMENT model  (anchor?) >
<!ATTLIST model
	%media-attributes;
	>
	
<!ELEMENT ref  (anchor?) >
<!ATTLIST ref
	%media-attributes;
	>
	
<!ELEMENT speech  (anchor?) >
<!ATTLIST speech
 	%media-attributes;
	>
	
<!ELEMENT text  (#PCDATA ) >
<!ATTLIST text
	%media-attributes;
	>
	
<!ELEMENT textstream  (anchor?) >
<!ATTLIST textstream
	%media-attributes;
	>
	
<!ELEMENT vector  (anchor?) >
<!ATTLIST vector
 	%media-attributes;
	>
	
<!ELEMENT video  (anchor?) >
<!ATTLIST video
	%media-attributes;
	>
	
<!--=========== WWW element ==========================================
	The www element allows World Wide Web content to exists as a CMIL
	media type.  The www element may link to an external HTML 
	resource.  This should also support inclusion of HTML as a 
	namespace but doesn't at this time.
	
	removed %media-attributes; to avoid duplicate id and title 
	attributes -->
<!ELEMENT www  (#PCDATA ) >
<!ATTLIST www
	%link-attributes;
	>
	
<!--=========== Associated Link Element ==============================
	Anchor serves as an analog of the HTML imagmap with the additional
	enhancement of temporal linking.  Anchor is borrowed from the W3C 
	SMIL1.0 recommendation.-->
<!ELEMENT anchor   (locator | sift | switch)*  >

<!ATTLIST anchor
	%link-attributes;
	%anchor-attributes;
	>

Valid XHTML 1.0 Strict

Valid CSS!