Chrome Extension
WeChat Mini Program
Use on ChatGLM

An Approach Of Information Extraction Based On Dom Tree And Weight Value

INTERNATIONAL JOURNAL OF GRID AND DISTRIBUTED COMPUTING(2016)

Cited 0|Views4
No score
Abstract
Eliminating noisy information and extracting information content from web pages are increasing to become an important research issue in information retrieval field. In this paper, we present an approach of information extraction based on Dom tree and weight value calculation, which contains the following steps, parse the web page to construct the Dom tree, extract the title and keywords, calculate the weight value and obtain the content. The experimental result shows that this method has the higher accuracy ratio by the various themes content extraction.
More
Translated text
Key words
Information extraction, Dont tree, Weight value, JSoup, Web pages
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined