Simultaneous Product Attribute Name and Value Extraction with Adaptively Learnt Templates

Computer Science & Service System(2012)

Cited 4|Views0
No score
Abstract
If we present the products as the attribute name and value pairs, it will improve the effectiveness of many applications. In this paper, we propose an adaptive template based method to simultaneously extract the product attribute name and value pair from Web pages. The titles of Web pages are used to assist the unsupervised template construction. And the template ranking strategy ensures the correct templates of every Web page are selected. Our approach contains four key steps: 1) construct domain attribute word bag by the titles of Web pages. 2) segment text nodes based on some default delimiters. 3) collect candidate attribute and value pairs 4) learn high-quality templates by a template ranking algorithm. The experimental corpus is collected from two domains: digital camera and mobile phone. Experiments show the precision of 94.68% and recall of 90.57% can be got by our method.
More
Translated text
Key words
value pair extraction,template ranking algorithm,unsupervised template construction,domain attribute word bag,world wide web,value extraction,mobile phone,retail data processing,high-quality template,digital camera,information retrieval,web pages,value pair collection,collect candidate attribute,text node segmentation,product attribute name,simultaneous product attribute name,product attribute name and value pair,adaptively learnt templates,simultaneous product attribute name extraction,domain attribute word bag construction,web data mining,internet,attribute name,candidate attribute collection,online shops,adaptive template,value pair,template construction,data mining,high-quality template learning,template ranking strategy,value pairs,text analysis,correct template,unsupervised learning,web page,adaptive template based method,html,ontologies
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined