Chrome Extension
WeChat Mini Program
Use on ChatGLM

Automatic Generation Of Wrapper For Data Extraction From The Web

ICWE'03: Proceedings of the 2003 international conference on Web engineering(2003)

Cited 0|Views1
No score
Abstract
With the development of the Internet, the Web has become invaluable information source. In order to use this information for more than human browsing, web pages in HTML must be converted into a format meaningful to software programs. Wrappers have been a useful technique to convert HTML documents into semantically meaningful XML files. In this paper, we propose a data extraction approach based on extracting schema, which generates automatically a wrapper to extract data from an HTML document, and produces an XML document conforming to given DTD. After the user defines extraction data schema in the form of DTD, the wrapper is generated automatically with the induction and leaning algorithm. The experiment indicates that the approach can correctly extract the required data from the source document with high accuracy.
More
Translated text
Key words
HTML document,data extraction approach,extraction data schema,required data,XML document,source document,invaluable information source,semantically meaningful XML file,high accuracy,human browsing,automatic generation
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined