Taggers versus Linkers: Comparing Tags and Anchor Text of Web Pages
Skills Inventory
- Information Retrieval
- Natural Language Processing
- Information Organization
- Python, MySQL
This study compares the properties of tags and anchor text metadata, with the motivation of auto suggesting tags for web pages.
Dataset
- delicious - ~11,000 URLs and their tagging metadata
- technorati - inbound anchor text for above URLs
Findings
- Some degree of conceptual similarity exists between tags and anchor text, and that anchor text can contribute to tag recommendations.
- It may be possible to extract semantic groups of tags based on existing usage, from which tags can be suggested.
- On assessing how tags and anchor terms pertain to subtopics within a given document, we propose a window-based text processing method that can be used to discover subtopic tags.
The ground work for this project was done during the research seminar class, and a paper was subsequently published in the ISD Symposium at UC Berkeley in Feb 2009.



