Next: 5 Discussion
Up: On the Construction
Previous: 3 Implications
As a check on the validity of this approach we describe three different selection systems in the terms we have been using.
The illustrative model in Figure 2 was, itself, based on the typical form of current first and second generation online library catalogs. The Representations (catalog records) are derived, via Representation Making Rules, from the Source Objects (books, periodicals, microforms, etc. in the collection). Some of this is the direct copying of fragments (e.g. titles, call numbers); some is a more complex intellectual representation derived from External Knowledge Sources (e.g. assignment of subject headings and classification numbers). In practice the representations are largely indirect copies, being derived directly from external sources, especially in the form of other cataloger's previously created catalog records (copy cataloging) rather that directly from the source object. The Searchable Index is limited in practice to a small number of the fields of the catalog records (Representations).
User Queries are accepted into the system from users either as well-formed and normalized Formal Queries, in the case where the searcher is experienced and uses the ``command line interface'', or in some less well-formed format that must be go through a Query Development process before searching.
The Query Development process commonly has an option (or requirement) that it be a two stage process. Two of the primary examples of such transformations in library catalogs are index browsing and transformations which rely on External Knowledge Sources. In the former, the User Query is used to select a subset of the terms from the Searchable Index which the user scans, and from which, the user then selects one or more terms which are used to retrieve the Representations associated with these terms. Such selection by ``browsing'' is, in effect, a two-stage retrieval process.
Queries may also be transformed with various sorts of External Knowledge, such as thesauri, dictionaries, controlled vocabulary lists, subject headings, etc. In online catalogs, these sources nearly always have some sort of syndetic structure (``broader term'', ``narrower term'', ``use for'', ``see also'', etc.) which can be used, either algorithmically or by hand, to harmonize the User Query with the system vocabulary (Searchable Index) for better results. For example, a syndetic structure may be in place so that one form of a query term will be represented as another, e.g. a search for Mark Twain will retrieve Samuel Clemens and vice-versa.
Retrieved sets are normally re-ordered by main entry before being output as a display or as a stream of records. This re-ordering of retrieved sets is, in effect, an automatic, ``hardwired'' partitioning instruction. Future online catalogs will probably allow the user to choose what re-ordering or aggregating (by date, by availability, etc.) to invoke.
In a simple case of retrieval from full-text, electronic texts would be stored (copied into the system) to become the Representation, and all of the texts would be searchable for the occurrence of specified phrases, words or word fragments. In such a case, if all of the text can be searched, the Searchable Index is actually (or logically) co-extensive with the Representation. The Retrieved Set could consist of either partial or complete copies of the those texts which satisfy the Matching Rule. The syndetic structure component of the Searchable Index is absent.
One degree more complex would be to represent the relative location of pairs of terms or to impose some vocabulary control in the form of stop words so that the significance of a term could be represented more reliably. More sophisticated still would be information storage systems which use or include algorithmically generated representations (e.g. weighted vectors of terms) of the texts.
Systems for filtering electronic messages (or other objects) constitute an example in which objects are represented, filtered (searched) and then discarded or relegated to other storage. In this case the User Query, once developed, remains indefinitely in place as a stored instruction (Matching Rule) which is used to select messages (Representations) as soon as they have been copied into the system. A stored, ``standing'' query resembles the default alphabetizing of retrieved sets in online catalogs: Both are, in effect, latent matching instructions, instrumental in partitioning whatever data may come their way. In this sense, filters with stored queries and transient data objects are symmetrical with retrieval systems with trier transient queries and stored data objects.
Primitive retrieval systems based of a serial scan of searchable records, such as the mid-twentieth century ``rapid selector'' machines for scanning long spools of microfilm, can be seen as an intermediate design between typical modern filters and typical modern retrieval systems.
Next: 5 Discussion Up: On the Construction Previous: 3 Implications