Learning from multi-topic web documents for contextual advertisement

16 years 7 months ago

Download research.microsoft.com

Contextual advertising on web pages has become very popular recently and it poses its own set of unique text mining challenges. Often advertisers wish to either target (or avoid) some specific content on web pages which may appear only in a small part of the page. Learning for these targeting tasks is difficult since most training pages are multi-topic and need expensive human labeling at the sub-document level for accurate training. In this paper we investigate ways to learn for sub-document classification when only page level labels are available - these labels only indicate if the relevant content exists in the given page or not. We propose the application of multiple-instance learning to this task to improve the effectiveness of traditional methods. We apply sub-document classification to two different problems in contextual advertising. One is "sensitive content detection" where the advertiser wants to avoid content relating to war, violence, pornography, etc. even if t...

Yi Zhang, Arun C. Surendran, John C. Platt, Mukund

Real-time Traffic

Data Mining | KDD 2008 | Page Level Labels | Relevant Content Exists | Sensitive Content Detection |

claim paper

» Finding advertising keywords on web pages

» Contextual advertising for web article printing

» Contextual advertising by combining relevance with click feedback

» Identifying Content Blocks from Web Documents

» Contextual Ranking of Keywords Using Click Data

» Sixearchorg 20 peer application for collaborative web search

» Extraction Techniques for Mining Services from Web Sources

» Algorithmic Detection of Computer Generated Text

Post Info
More Details (n/a)

Added	30 Nov 2009
Updated	30 Nov 2009
Type	Conference
Year	2008
Where	KDD
Authors	Yi Zhang, Arun C. Surendran, John C. Platt, Mukund Narasimhan

Comments (0)

Sciweavers

Learning from multi-topic web documents for contextual advertisement

Data Mining | KDD 2008 | Page Level Labels | Relevant Content Exists | Sensitive Content Detection |

Explore & Download

Productivity Tools

Sciweavers