Research on Internet Corpus Collection Method

2022 IEEE Conference on Telecommunications, Optics and Computer Science (TOCS)(2022)

Cited 0|Views2
No score
Abstract
With the popularity of websites and the emergence of a large number of text data, the Internet has become an important channel for people to obtain information resources. In today’s society, Internet corpus has become a necessary corpus for linguistic research due to its rich resources, large scale, rich language types and low acquisition cost. It is a common and effective method to obtain corpus on the Internet by using crawler technology. This paper systematically introduces the principle of Internet data transmission, and crawlers are used to crawl the Internet corpus. Finally, some common anti-climbing mechanisms are introduced, which can be circumvented to better crawl corpus.
More
Translated text
Key words
internet corpus,internet data transmission,web crawler
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined