Learning website hierarchies for keyword enrichment in contextual advertising.

WSDM(2011)

引用 2|浏览15
暂无评分
摘要
ABSTRACTIn Contextual advertising, textual ads relevant to the content in a webpage are embedded in the page. Content keywords are extracted offline by crawling webpages and then stored in an index for fast serving. Given a page, ad selection involves index lookup, computing similarity between the keywords of the page and those of candidate ads and returning the top-k scoring ads. In this approach, ad relevance can suffer in two scenarios. First, since page-ad similarity is computed using keywords extracted only from that particular page, a few non pertinent keywords can skew ad selection. Second, requesting page may not be present in the index but we still need to serve relevant ads. We propose a novel mechanism to mitigate these problems in the same framework. The basic idea is to enrich keywords of a particular page with keywords from other but "similar" pages. The scheme involves learning a website specific hierarchy from (page, URL) pairs of the website. Next, keywords are populated on the nodes via successive top-down and bottom-up iterations over the hierarchy. We evaluate our approach on three data sets, one small human labeled set and two large-scale sets from Yahoo's contextual advertising system. Empirical evaluation show that ads fetched by enriching keywords has 2-3% higher nDCG compared to ads fetched based on a recent semantic approach even though the index size of our approach is 7 times less than the index size of semantic approach. Evaluation over pages which are not present in the index shows that ads fetched by our method has 6-7% higher nDCG compared to ads fetched based on a recent approach which uses first N bytes of the page content. Scalability is demonstrated via map-reduce adoption of our method and training on a large data set of 220 million pages from 95,104 websites.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要