Multiple Uses of Data in the Museum Environment

MULTIPLE USES OF DATA IN THE MUSEUM ENVIRONMENT

Douglas MacKenzie, Sandy Kydd & Morvyn Myles, DMC Ltd.

1 INTRODUCTION

The fondness museum professionals have for standards committees and for data interchange standards is well known. From the Getty thesaurus to the Dublin core, on standards committees debating the 'best' graphics format, the quest for the universal language which allows museum to talk unto museum goes on. Apart from the common problems with standards in computing: technology renders them obsolete before they are finally ratified; everyone, whether through benevolence, competitive advantage or the 'not invented here syndrome' introduces an extension or enhancement to the standard; and too much time is spent discussing the standard instead of finding something useful to do with the results, there is surely an opportunity being missed as a result of the mindset which urges standardisation.

The aim of allowing a universal search of every museum database is a laudable one but what does it really offer? If I search a ship database for vessels between certain tonnages, are those values based on gross tonnage or on displacement? If I want to look at voyages to modern-day Kaliningrad in the seventeenth century do I look for the modern port, or the earlier German name of Königsberg, or, if the data refers to trading records from Scottish mercenaries in Russian service, Korolovets? The way in which any historical question is framed will depend to a great extent on the questioner's background and intellectual baggage. To make data available to a wide audience either the data needs to be standardised, and user thinking modified, or the users left with their modes of thought and professional practices intact, and the data varied. If we can accept that 'a museum can be said to offer intellectual access to its resources if it enables people to think, for purposes they have defined themselves, about the objects in its collection' (Orna, 1994) why should we try to constrain users in the potentially far more flexible world of the virtual museum or archive? Data is infinitely more malleable than people.

Before we begin to think of universal access to museum data we need to solve a more local problem. How can data sources within one organisation meet the needs of different users within that institution and the needs of visitors, physical or virtual?

2 THE COMMON DATA WELL

Every institution has multiple uses for its data. That is why many museum Web sites are on-line versions of brochures or ugly extensions to collections management data. Several collections management software packages boast of their public-access add-ons. This, unfortunately, ignores the rather obvious fact that most members of the public want to ask questions quite different to curators responsible for museum collections. One collections management package offering this add-on feature illustrated its advertisement with a screen shot of an entry for an image of a Scarlet Macaw with a button enticingly labelled Related. What would a related entry be? Another picture of a macaw, a parakeet, other animals, other preliminary oil sketches, other works by the same artist, other paintings by 19th century Belgians, other paintings from the same museum collection, purchased with the same bequest, other paintings on paper or with similar brush strokes or similar colours? We can think of limitless relations: the ones we think of first reflect our own interests. Potentially, almost all the relations we can conceive of are in the underlying database somewhere, in that it would be possible to construct an SQL statement (assuming the database supported this) to reveal them. The Related button is, however, too coarse a tool to achieve this.

If the answers are in the data, leave the data well alone, adjust the buttons instead. Develop an interface appropriate to the user. Better still, if we are `enabling people to think for purposes they have defined themselves', let us, where possible, do away with the button altogether. Wendy Hall has written about ending the 'tyranny of the button' in multimedia systems generally (Hall, 1994). The button in the 'standard' collections management package may be the thing to make the museum professional protest, 'I don't see why we should have to conform our registration needs/practices to the computer's needs/limitations' (recent MCN-L posting). In the public access system it takes on the role of, in Peter Walsh's memorable phrase, 'the unassailable voice' (Walsh, 1997): this is the connection between this artefact and that which will interest you, do not stray from the tightly policed corridors of the virtual museum.

3 FILTERS NOT DATA STANDARDS

The authors of this paper have already described some of the mechanisms in moving a database from a kiosk application to the Web (Kydd and MacKenzie, 1997). The general principles learned from that exercise are applicable to the porting of data between any electronic delivery platforms in the museum: concentrate on the needs of the users in the various environments, the data will look after itself.

So, for example, the local newspaper, which uses the system for managing its photographic archive, is interested in an image's accession number because it identifies the location of the original negative or print but this is irrelevant to the Web tourist. The Webmaster needs a database which is fast, cheap and compatible with the server software but will do little or no editing work. The curator creating and maintaining the database needs tools for data entry and editing, for comparing entries, for searches and for generating reports. In both these instances the needs of the user are met by interface design. In the first scenario, one interface simply has (at least) one more field than the other. In the second, creating two interfaces to the one database may not be practicable: an affordable server-ready database may not have the features taken for granted in most common PC database packages; the curator may not wish all entries to appear on the Web; the software which needs to run with the two applications will be quite different and may require different operating environments (a Unix server, Windows spreadsheets and word processing facilities, for example); the computing skills of the two individuals are likely to be quite different and, dare I suggest, the curator is less willing to change working practices to compensate for the lack of features, or the difficulty of using them, in a database system with which the Webmaster is comfortable. The compromises made in choosing a system to satisfy both needs is likely to satisfy neither so why try? Use two separate systems, two different databases. Provided one can quickly transfer its data to the other it really does not matter. Transferring data between modern databases is a question of knowing what the tables and fields concerned are and this is essentially a header definition. No matter what the subject area, one field plus this header is probably sufficient to allow two systems to share data in a meaningful way.

4 WHAT IS EASY AND WHAT IS DIFFICULT

The issue of database selection for core data is essentially a non-issue. Whatever the initial data source is, it is the content which is important not the format. Modern databases all come with a range of import filters or one can be populated from another by the construction of an SQL query. Most development languages (VB, Visual C++, Smalltalk etc.) can use ODBC drivers to talk to a wide range of databases. As has already been argued it is making sense of the content where the challenge lies. That is an interface design question not a database standardisation one.

Similarly images can easily, and automatically be converted from one format to another: perhaps they exist as TIFF images for internal work in a networked archive, are distributed on CD-ROM to internal and external users as 24-bit JPEG images. Moving them to the Web may mean there is a mixture of 8 and 24 bit GIF and JPEG images. The standard is again not the issue but bandwidth is. Users on dial-up lines may be less than enthusiastic waiting for images of several hundred kilobytes to load yet there will be some pictures where they consider the wait worthwhile. To give them the choice the obvious solution is thumbnails. There are certainly automatic thumbnail generation packages but what happens, if we decide that all the thumbnails are to be the same size to fit in a pre-defined search results matrix, with images in a wide range of shapes and sizes, what about large images where shrinking the whole image produces a thumbnail where identification of the key features is no longer possible? Manually creating thumbnails is not an option for something like the TAMH project where around 4000 images are involved. Creating a toolkit of small applications to ease tasks like this is an ongoing process and a route worth following for anyone trying to get more value from their data.

The really difficult thing, as in any computer application, is giving the users what they want (or need, the two are not usually the same) complicated here by the fact that multiple use of data implies multiple audiences for that data with very different interests, backgrounds and working practices as described above. This has basic technical implications. The kiosk-based TAMH implementation concentrated on letting users, ranging from young schoolchildren to post-doctoral researchers, ask the questions they wanted to ask in they way they wanted to ask them and so avoided fixed hypertext links as much as possible (MacKenzie, 1995, 1996). Current Web technology prevents a straight emulation of this there and the whole Web is, after all, built on such links. An earlier paper (Kydd and MacKenzie, 1997) described some of the strategies we employed to implement this design philosophy on the Web in an ad hoc way. Work since then has concentrated on developing tools to allow the migration of data between print, collections management, kiosk and Web applications to follow a more generic model. We would not try to argue that the tools we have developed are the only mechanism by which such a task can be carried out but we would argue that the general approach is the only valid one, that it is not about standardisation of data but rather about developing interface tools and filters which present that data in ways appropriate to the different user groups.

5 EARLY EXAMPLES

We came to the multiple use of data on an ad hoc basis. Doing some work on archiving material on Joseph Beuys in Scotland, we had word processed catalogue entries on the Strategy: Get Arts exhibition of 1970. However, in addition to printing these, we required a formatted view of the pages in a touch-screen kiosk application. It would have been possible to have prepared SGML versions of the files, as several museums do with their catalogues (Light, 1995), which would produce both on-screen and printed output. However, time was short, the person responsible for the files was not skilled in SGML so we produced a very simple in-house formatting tool, called BLURB, to serve the same function. It proved adequate for its intended task and also allowed the catalogue entries to be included in a CD-ROM on Joseph Beuys. Because it is simple to use, fully integrated with in-house word-processing and has little adverse effect on the typing speeds of secretarial staff, it also became the method used for formatting newspaper articles in the TAMH project. BLURB will never go before a standards committee, is quite limited in what it can do, but was selected for its ease-of-use rather than any standardisation considerations. The relatively simple syntax means that it is simple to run it through a filter to produce, for example, Web pages of the newspaper articles in TAMH (www.dmcsoft.com/tamh/).

Similarly our Antique Golf site (www.dmcsoft.com/antiquegolf/) is simply a Web-based extension of an existing database. The clubs are listed in an Access database for off-line searching and for inventory purposes. Access was chosen because the person responsible for maintaining the inventory was familiar with the package and, again, moving this database to the MySQL database version which drives the website is simple, just running an SQL query. Similarly DDE/OLE allows it to be the engine for a print-based catalogue.

The point in choosing these tools is not that the data is standardised but that it is adaptable and portable. The Access golf club database meets the needs of the person maintaining the inventory and who is sitting with the real clubs; the Web-based version meets the needs of the browser who wants to see what the clubs look like.

6 MUSDEV: A GENERIC APPROACH TO THE PROBLEM

The early version of the TAMH data entry module was really just the display version of the software with the database write enabled. As the entries grew it became apparent that this was not adequate for cross-referencing articles or checking what was already there or for maintaining the fields which were not seen by most users, accession numbers, the relationship between main table entries and tables of short biographical entries, visit sites, keywords, thesaurus entries and the like. We also suffered from good ideas: new fields and tables to add, and these meant individual changes to the database query and write parts of the code on a woefully uncontrolled basis. Another realisation was that our first thought, to drive the system from a database was the right one, but whereas we initially held around eighty-five per cent of the data in databases we should have, and ultimately did, move everything to one database. This meant moving map data, graphing and icon-display functions into the database and this forced the decision to go for a more generic solution to the problem.

The result is what we call MusDev, an interface to a database which defines the relationship between the data elements but does not attempt to define what happens with that relationship. Nor does it care what the database is or what it contains. Our TAMH data source was an Access database, so we stuck with that, but it would work equally well with any other database, local or remote, for which ODBC drivers exist.

Figure 1 below shows a Main Table (in effect, a short article) entry on Admiral Greig. The buttons along the bottom of the screen indicate the other tables in the MusDev source for this project. If the images button is pressed, the images currently associated with this entry are shown (Figure 2). If I wish to associate new images or longer articles with this short entry, going to the Add Links screen (Figure 3) offers the facility of browsing and selecting other table entries and dragging and dropping them onto the link diagram. The entry can be completed on Figure 1 by adding descriptive abstracts and assigning it to time periods, sources and subject area categories according to those already in the database or, added interactively, from this screen. Keywords are used as descriptors, those appearing in lowercase are ones generated automatically by the system based on its thesaurus which may also be edited on-screen. In this example the alternative spelling Cronstadt and the city to which it is the port, St Petersburg, is generated for Kronstadt.

Figures 1, 2 and 3: Main tables, images, and adding links

This has defined the entry and it relationship to other types of entry (images, articles, places to visit, artefacts etc) and gathered some clues as to search terms which are relevant but it makes no reference to how a user of the kiosk- based system will see the relationship between the entry and, say, the image. This is essential if we are to leave the option of multiple use of the data open.

MusDev as Toolkit

MusDev is also an ever expanding toolkit for the administrative tasks identified in Section 4 above. The tools we have incorporated in it so far include the following.

Automating thumbnail generation

The Thumbnail Generator is illustrated in Figure 9. Although it can with Autocapture generate thumbnails for whole sets of images, where an awkward shape or content presents, as in the portrait of the lady in Figure 9 where the important detail is the fact that she has a characteristic head-dress and is reading a bible, a draggable box can specify the area to be thumbnailed.

Figures 9 and 10: Thumbnail generator and image search matrix

The advantage of generating meaningful thumbnails is apparent from Figure 10 which shows one of the image search matrices. Like many other developers we have struggled with using keywords to describe images and their inherent limitations (MacKenzie, 1995) and followed similarity algorithms such as QBIC (Holt, Hartwick and Vetter, 1995) and ARTISAN (Eakins, Shields and Boardman, 1996) with interest and reached the conclusion that, for now, the most efficient way of allowing people to find images of interest to them is just to let them look. Even with very fast flicking through pages such as the one illustrated here, users can pick out the images they want. This is the justification for "content-rich" thumbnails.

The Keyword Hierarchy Tool

The problems of using keyword or even free-text searching to identify topics of interest are well documented. Michelle Kaufmann (Kaufmann, 1996) gives examples of searching for Passover and it resulting in matches on items about dietary restrictions, identity or even just as a reference to a point in time. Similarly the search would miss references to Passover in the Hebrew form, Pesach. Examples in this paper have already referred to the changing port names in the TAMH project where the form used will often depend on either the nationality of the searcher or the period of history in which he or she is interested.

Figure 11: Keyword hierarchy

We decided on a very simple approach, a screen which allows entry of one keyword followed by a list of acceptable synonyms (Figure 11). The type of item for which we needed to specify synonyms was so wide: from geographical names: Kurzeme for Courland; different English idioms, railroad for railway; different terminology in cargoes, flax, hemp and codilla used interchangeably; or abbreviations RNLI for Royal National Lifeboat Institution that a simple approach seemed to be the best and no catch-all thesaurus would serve our purpose.

The point is, however, that using this tool in a different subject area where it was appropriate we could import any commonly used thesaurus or controlled vocabulary such as the one Michelle Kaufman describes at the Shoah Foundation. Again this is an example of concentrating on the interface not the standard. Any controlled vocabulary can ultimately be decomposed to a one to many relationship. Different projects will call for different thesauri so rather than deliberate as to which is generally the best one, allow the possibility of using any. If we can agree a simple way of relating one controlled vocabulary to another where there is some degree of overlap, so much the better.

7 COMPONENT APPROACH

The kiosk-based version of TAMH uses the database created by MusDev with no changes whatsoever. It merely adds the interface layers and search tools. Figure 4 shows the actt of searching for St Petersburg (one of the system- generated keywords) and how the entry appears to the kiosk user (Figure 5). Figure 6 shows what the user sees by touching the small image, a larger view with magnification and other options. Returning to the main entry display and touching (or dragging the mouse) over Battle of Hogland brings up a Link button (Figure 7) to start searching for all references to that phrase. Any word, or series of words can be selected in this way, everything is 'linkable' not just the terms the system authors specify.

Figures 4, 5 and 6: Search results, entry record, and image viewer

All of these display, link and search options are separate components which use the database content as their input. New components can be added, and existing ones modified, without a requirement to change the underlying database. Components themselves may be edited (or modified from an administrator screen Figure 8) so that not all displayable fields from a particular component are on-screen, simplifying the mariner database search and display for a school audience, for example. MusDev is, in effect, our collections management package: the kiosk-based version of TAMH is another department which seeks to use some of the core data for completely different purposes and, therefore, wishes to have something other than the collections management view of the data.

Figures 7 and 8: Text link and administrator screen

The component-based approach extends the life of the data enormously. We can add new components to reflect changes in technology, a user has a machine fast enough to display the archived TIFF images rather than the usual JPEG representations, or changes in emphasis, (exhibits associated with Admiral Duncan, last year being the bicentenary of his most famous victory at Camperdown) or changes in use, a school asks for a facility for students to cut and paste images, multimedia elements and text into multimedia essays. What we really wanted, though, was a button marked Web which would take the database and publish it to the Web not exactly the way it appears in the kiosk but in a way which takes account of Web browsers' needs and the technology limitations and strengths of that particular environment. To say we are not quite there yet is something of an understatement but the component-based approach has certainly been a step in the right direction.

8 COMPONENTS AND THE WEB

Many current website production and management tools allow site designers to produce component-based sites. DMC uses its own tool, WebDev (www.dmcsoft.com/webdev/) to achieve this, allowing the integration of dynamic content from databases into web pages. WebDev uses a component-based architecture for sites, allowing the separation of content-producing components from interface appearance components such as headers, footers, menus and graphics. This approach works well with the MusDev component model, allowing site templates for the overall look and feel of the website to be distinguished from the components which MusDev has to generate.

Websites and their constituent web pages can be broken down into a series of components. A web page which displayed an article record from the TAMH database might consist of standard header and footer components for the page, and a database query component which ran a database search for the record and formatted the data for output. The header and footer components would be reusable on other pages in the website, removing the need to change all the pages on the site when these standard elements had to be altered. An image collection component might point to the location of an image archive on the web server, and could use an extra parameter to refer to an individual image within that collection for display on a web page. If the image collection were moved to a different location, the only reference which would need to be changed would be that in the image collection component itself. Other components could encapsulate web page elements such as search input forms and website menus. Breaking the web page, and consequently the website down into a series of such components, makes it more manageable.

The examples given in the previous section have web equivalents. The keyword search form in Figure 4 would translate into an HTML form. Clicking on the button would link to a page with the keyword query component, producing a record display equivalent to Figure 5, with the image being an inline image thumbnail from an image collection component. This image would be linked to a full size image, so that the user could inspect it as in Figure 6. Because of the limitations of HTML, the equivalent of the text selection and link searching in Figure 7 would simply be a form with text input box for searching on a user-specified term.

9 WEBSITE PUBLISHING

WebDev allows a website to be structured so that components which produce content (from image archives or database queries) can be separated from other components which control site structure or page layout. A site structure and layout components for page style could be used in conjunction with templates of content production components to build a generic website where all that was missing to complete the system was the equivalent of the kiosk system's display, link and search components.

Publishing to the web involves taking the interface layers and search tools developed for the standalone kiosk system and creating web-based versions of them. MusDev has provided a number of tools for managing both database and image information, and separating this data from the actual interface used in a user-based kiosk system or an administration-oriented collections management system. The next stage in this evolution is to take the user system and map this onto a web-based user interface.

MusDev makes the transfer of the underlying data into a straightforward task. The database can be migrated from the standalone system to a web-accessible database server using a combination of MusDev tools and standard software such as Microsoft Access. For the TAMH project, DMC currently uses a simple set of Access queries and an ODBC link to update the server database from the standalone version, although this task may eventually be integrated into the MusDev toolset. The image archive can be copied to a webserver, or shared between standalone and web versions in a networked environment, and necessary conversions of image formats to web-friendly versions can be achieved. Image references in the database can be converted to point to the new locations on the server as part of the database import routine. Thumbnail generation can produce images which allow web users to preview images before selecting those of interest to them to download at a better resolution and quality, but without wasting download times on images they have no interest in.

As mentioned previously, the kiosk-based version of TAMH uses display, link and search components to build its interface. The aim is to have MusDev map these components onto web-based equivalents. Display components can be treated as generic HTML formatting of database records and images, link components implement searches on the database from user-selected input criteria, and search components have HTML input forms and SQL queries to search the database. Using this component-based approach, website templates equivalent to the kiosk system components can be constructed and output by MusDev. Parameters for the components in MusDev are combined with the templates to output the website equivalents of the kiosk system.

It should be noted that because of some of the limitations in current web-based interfaces, these components are described as equivalent, rather than the same. For instance, the example in the previous section of selecting text in a record and having it automatically appear as search text is not possible in standard HTML text, so the equivalent web component allows manual entry of the selected phrase into a text box in an HTML form. Similar compromises can be made with other types of components. The advantage of the component architecture is that if a new solution is produced for a particular component, then the application can have its display templates for that component type changed to produce the new model for all such components.

Websites in general differ from kiosk systems in that they need to provide the user with more context information: a user at a kiosk system usually knows where they are (i.e. they have walked into the museum and have a physical reference for their location). Users on the Internet do not have the same frame of reference, as they may have linked from anywhere on the web to reach the website location. Therefore they need information to identify what the site is they are viewing, and to help them navigate within that site. A site designer could choose or produce templates for the overall look of the website, provide the extra context information that the website needs (information about the website, news and external links sections, etc) and generate the content- specific components based on the kiosk system design in MusDev.

The eventual goal is to have a straightforward route to publishing the same data and similar interface and search tools on both the kiosk and web systems. MusDev currently produces the component-based interface for the kiosk system. WebDev currently provides website design and management functionality. Up until now the move from kiosk to web has involved a manual process of mapping one set of components onto the other. The task now is to produce a flexible template-based system to allow MusDev to export its components to a WebDev project.

FUTURE DIRECTIONS

The intention of MusDev is to produce a generic set of tools for integrating museum data with a number of administration- and user-centred systems. Currently it has been used with our own TAMH project, and we are actively looking for partners within the museum community to test the portability of the system to other subject areas. One of the ways in which MusDev will be enhanced in its web integration capabilities is by building a library of museum-like components for users to choose from, so that we get ever closer to the "Press button for Web version" situation.

MusDev is a set of tools which allow a single set of data sources to be used for multiple purposes within an organisation. The emphasis on the portability of the data and the independence of data from the interfaces and search mechanisms which use it means that it can be used with one interface for data management and another interface for a kiosk system. The final step of integrating an interface to the web into the system is well under way.

REFERENCES

Eakins, J.P., Shields, K. and Boardman J. (1996) ARTISAN - a shape retrieval system based on boundary family indexing, In Sethi, I.K. and Jain, R.C. (eds) Storage and Retrieval for Still Image and Video Databases IV, 210-215 (Return to text)

Hall, W. (1994) Ending the tyranny of the button. IEEE Multimedia, 1(1), 60-68 (Return to text)

Holt, B., Hartwick, L. and Vetter S. (1995) Query by Image Content: the QBIC Project's Application in the University of California at Davis's Art and Art History Departments. Visual Resources Association Bulletin, 22(2), 61-65 (Return to text)

Kaufman. M (1996) Memory and Rediscovery: Using a Controlled Vocabulary to Provide Access to Holocaust Survivor Visual Histories. Spectra 24(2), 26-29 (Return to text)

Kydd, S. and MacKenzie D. (1997) Going On-line: Moving Multimedia Exhibits onto the Web, In Bearman D. and Trant J. (eds) Museums and the Web 97: Selected Papers, AMI, Pittsburgh, 299-313 (Return to text)

Light, R. (1995) Getting a handle on exhibition catalogues, the Project CHIO DTD. In Bearman D. (ed) Multimedia Computing and Museums, AMI, Pittsburgh, 368-381 (Return to text)

MacKenzie, D. (1996) Beyond Hypertext: Adaptive Interfaces for Virtual Museums, Proceedings of EVA'96, Vasari Enterprises, Aldershot (Return to text)

MacKenzie, D. (1995) Using Archives for Education, Journal of Educational Multimedia and Hypermedia, 5(2), 113-128 (Return to text)

Orna, E. (1994) In the know. Museums Journal, 94(11) (Return to text)

Walsh, P. (1997) The Web and the Unassailable Voice, Archives and Museum Informatics, 11(2) (Return to text)