TileBars Example

Below we show an example run on a query about efforts at technology transfer of research at Xerox and PARC, run the TREC/TIPSTER collection of over 1 million newswire, newspaper, magazine and government articles, dating mainly from the late 1980's. The query consists of three Term Sets, where each set of terms is meant to correspond to a topic of the query:

The TileBars for the most relevant looking cluster (from the results of a Scatter/Gather) are shown. The ranking reflects criteria specific to this interface: the documents are ranked first by overlap: how many segments have hits for all termsets, second by total number of hits, and third by the ranking from a similarity search. The number shown is the original similarity search ranking.

Each large rectangle indicates a document, and each square within the document represents a coherent text segment. The darker the segment, the more frequent the term (white indicates 0, black indicates 8 or more hits, the frequencies of all the terms within a term set are added together). The top row of each rectangle correspond to the hits for Term Set 1, the middle row to hits of Term Set 2, and the bottom row to hits of Term Set 3. The first column of each rectangle corresponds to the first segment of the document, the second column to the second segment, and so on.

In this example we can see at a glance that all three topics are discussed in at least one segment in the first 16 documents, but that the last four documents discussion only Xerox and research, with no discussion of business or technical transfer. We can also see the relative lengths of the documents and how strongly the three topics overlap within the documents. The score next to the title shows what a standard ranking algorithm would produce.

Back to TileBars Overview