Video information retrieval (in 15 minutes)
INFO 202 - 26 November 2007
Yiming Liu
Retrieving video: bridging the Semantic Gap
Typical use cases
- Finding specific videos
- Identifying classes of videos
- Summarizing long-form content
General approaches
- Content-based retrieval
- low-level features
- structural/composite features
- Metadata-based retrieval
- semantic features
- ...provided that metadata exists
- type and quality of metadata
Content-based methods
- low level features
- color/brightness- histograms of pixel frequencies
- motion - analysis of consecutive frames
- text - closed captioning, overlay text; reduce problem to OCR, IR, NLP
- speech - audio extraction; reduce problem to speech and audio recognition
- composite and structural features
- scene change detection - changes in low-level features
- face detection - identifying human faces; not quite recognition
- object recognition - for constrained set of objects
Content-based example: commercial skip
- Have to identify commercials first
- Features of commercials:
- blank frames
- disappearance of station logos / persistent "bugs"
- scene change rate
- sound levels
- Supervised methods - train a machine learning classifier
- Unsupervised methods - vector similarity, clustering, decision trees
Metadata-based methods
- Semantic features
- Externally labelled (manual or automatic)
- categories - stock collections
- descriptions - controlled vocabulary / free-form text
- tags - the YouTube approach
- etc. - context/geolocation, time, usage
Metadata-based example: interestingness
- How to determine interestingness? Newsworthiness?
- Not part of the content
- but meaning can be inferred from use
- System must support and collect this metadata
Mixed methods?
- Content analysis for low-level
- Motivate user contributions for semantic-level
Conclusion
- Two major sets of approaches
- Content: from low-level features to high-level meaning
- Metadata: how to get?
- "Content is people too"