Chrome Extension
WeChat Mini Program
Use on ChatGLM

Theme-Based Spider for Academic Paper.

Communications in Computer and Information Science(2017)

Cited 0|Views28
No score
Abstract
Nowadays contents of the web multiply everyday. However, for particular company or individual, some kind of information has higher priority. For example, among so much information on the internet, web pages containing academic papers are definitely more attractive to a researcher. And the problem lies in how to find that kind of data. Therefore we design a spider that targets only on online academic papers. Besides reserving three major parts of a traditional spider, we make some modifications on Filter and Parser so that our spider is competent enough to accomplish the mission. And the essential mechanism of recognizing and extracting expected pages primarily lies on keyword-matching and Finite State Machine Theory. After roaming on two web sites, the spider successfully collects desirable information. We can safely see from the result that in future by optimization and modification this theme-based spider may work more efficiently or even expands to other fields of interest.
More
Translated text
Key words
Theme-based,Spider,Paper
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined