Next: 5 Conclustion Up: OASIS: Prototyping Graphical Previous: 3 What's New about

4 OASIS Commands

4.1 FEWER, SUMMARIZE and SORT

The OASIS commands, FEWER, SUMMARIZE and SORT, help searchers to deal effectively with large retrieved sets and yield four distinct benefits.

Convenient reduction of excessively large retrieved sets
Improved ordering for display
Statistical analysis of retrieved sets
Comparative collection analysis in specific subject areas

A search that retrieves an unmanageable, large set on MELVYL leaves the searcher with the choice of either reading through all the records or applying some type of limit. Even experienced searchers find learning the syntax of new and different systems daunting. FEWER is designed to progressively limit sets in a logical sequence, thereby reducing this burden by transforming it into a simple, iterative operation.

OASIS offers the repeatable FEWER command pre-set to facilitate reducing large retrieved sets, using aspects of the MARC records most likely to reflect a typical searcher's preferences. Applied to the large set of over 1,000 records, the FEWER command is usually quite effective at progressively narrowing the set dramatically based on parameters that reflect the searcher's needs.

OASIS assumes default preferences, both for the purposes of modelling an expert searcher's approach and providing a starting example of possible defaults that can be easily revised by the user. The defaults can be customized by re-setting the preferences list with any legitimate MELVYL command in any sequential order.

For example, a searcher on the Berkeley campus would presumably prefer to see records held locally, in English, and of recent publication. The FEWER command can be set to reflect these preferences available to be invoked for any search result. For example, setting the FEWER defaults list to MELVYL commands that first limit the set to the Berkeley campus, then to the language English, and finally to within the last ten years usually results in a set much more likely to be considered useful by that searcher. This may seem self-evident, but transaction log studies have shown that few users apply these basic limiting commands in any significant numbers [Borgman 86,Larson 91a,Larson 91b,Tonta 92].

SUMMARIZE and SORT

The OASIS summarize options are the focal point for record analysis functions. Retrieved sets are post-processed by downloading them and summarizing parts of the MARC record. For instance, a subject summary for a set of records can be viewed in a browse window by selecting the SUMMARIZE SUBJECTS option.

Figure 1 shows an author search on cognitive linguist George Lakoff that yields 21 records in MELVYL (The search issued was find pn lakeoff, g). The SUMMARIZE SUBJECTS option presents a ranked list of the subject headings (according to frequency and then alphabetically) assigned to the 21 retrieved records, which can be viewed in a browse window.

Figure 1: Scrolling list box to display subjects from a search on author ``Lakoff, G'' ( f pn lakoff, g).

Figure 2: To get an idea of which libraries on the Berkeley campus hold books by G. Lakoff, the SUMMARIZE LIBRARIES option can be selected.

Figure 3: Number of books held by various libraries.

Figure 4: A browse window summarizing how the record set from the search on author Lakoff, G falls in the Library of Congress (LC) scheme can be summoned by selecting the SUMMARIZE CALL NUMBERS option.

Analysis of retrieved sets by any searchable field in the MARC record or combination of fields can reveal new information about the nature of the records retrieved. This information can help the user make a more informed decision about how to narrow a large set. For example, a large set of records retrieved from a search on the subject information science can be analyzed by language and arranged by date to reveal that there seems to be a predominance of titles in English on the subject of information science with more German, French, and Russian titles than other languages in this collection. See Figure 6.

Figure 5: The original search and aggregation of xs information science.

Figure 6 is a plot of 5 separate datasets on one graph.

Figure 6: Summary of results from search on xs information science.

Local Control of Displayed Data

The presence or absence of subject subdivisions can lead to very different impressions. For example, in the results of a SUMMARIZE SUBJECTS command, whether or not the subject subdivisions are displayed is now a user controlled preference. Consider a search for books on cartography. Without the subdivisions, the results look like Figure 7. The display for results of the same search with the subdivisions gives a rather different impression as seen in Figure 8.

Figure 7: Major subject headings only for the search f (xs cartography # and lang eng) and date recent.

Figure 8: Subject headings and subheadings for the search f (xs cartography # and lang eng) and date recent.

Large Result Set Visualization

Another example of the payoff of a manipulable representation is the following sequence. Figure 9 shows the results of a search and aggregation for books on Confucianism. It also tells us something about the collection development habits of the University of California over the past century.

Figure 9: The original search and aggregation of xs confucianism.

Notice here that Berkeley (``BG'') has only recently been collecting large numbers of non-English titles on this subject.

Figure 10 shows the effect of first, resetting preferences to show only the most common languages while disregarding where they are held gives us different view of the set. We see what the major languages are for the campus wide UC collection are a mixture of eastern and western languages.

Figure 10: `` xs confucianism'' reorganized by 6 major languages.

But this sort of display is not the only choice. We are now experimenting an interface that uses X Windows for some of the display chores. Figure 11 shows the number of books collected by UC over the last 30 years plotted by year.

Figure 11 and Figure 12 were generated from same data as the preceding aggregate tables. The table of numbers, instead of being sent to the screen, are simply sent off to Gnuplot [Stevens 93,Morin 93] which is run as a subprocess of the lisp interpreter. We are now exploring different kinds of plots to see which are effective in helping users to cope with large result sets.

Figure 11: Each language plotted year by number of books for the f xs confucianism results.

Figure 12: English plotted over the last 130 years for the f xs confucianism results.

Figure 12 is another view of the `` confucianism'' data, this time focusing on English, Chinese and Korean, but tracing their numbers over the full range of years in MELVYL.

In a union catalog or after combining data from different catalogs, comparative collection analysis in specific subject areas can be generated. The following table in Figure 13 shows unique holdings by date of general works on water conservation in University of California campuses, each cell counts the titles not held in campuses to the left, e.g. the UCLA campus holds 2 works dated 1983 not held by the Water Resources Archives or Berkeley.

Figure 13: Results of the subject search f su water conservation arranged by location. ``Other'' represents all other UC campus cataloging agencies.

Here follows another view of part of the data in Figure 13, which offers a year by year comparison of collection strengths.

4.2 Extensions

With generalization of the software to provide access to all of the catalog record, many other analyses are possible.

To find the major publishers of books on tennis, a search can be issued on the subject of tennis. Using the MARC tag 260, subfield ``b'' field a list of publishers on tennis can be created to provide the user with a different perspective on the retrieved set of records. See Figure 14.

Figure 14: Scrolling list box to display publishers from a search on the subject ``tennis'' ( f su tennis).

This is just one example of how more effective use can be made of existing catalog records. In principle, any MARC field or identifiable part of a field could be used.