Web Scraping Tool For Newspapers And Images Data Using Jsonify

Qingli Niu,Irfan Ali Kandhro,Anil Kumar, Shahnawaz Shah, Muhammad Hasan, Mehfooz Ahmed, Fei Liang

JOURNAL OF APPLIED SCIENCE AND ENGINEERING(2023)

引用 1|浏览1
暂无评分
摘要
Web scraping is the process of extracting data from a website in an efficient and fast way. In such a scenario, python programming can offer useful set of methods that help web editors to improve the quality of the provided service. This scraper contains three steps 1) to understand the structure of web page, 2) design regular expression pattern and finally use that pattern to get certain data. In this paper, we also used Flask, Request, JSONify library to get the data, after processing, the data is transformed into the JSON form and ready for CSV with help of API. After generated all required regex patterns, the system uses these patterns as a set of rules, and with this, designed scraper tool works efficiently, and achieved outstanding results with help of support libraries to storing and extracting the news and web-based information. The proposedWeb scraping tool eliminates the time and effort of manually collecting or copying data by automating the process. It is found that this designed scraper is easy and direct approach to extract the newspapers, websites, blogs, and images data.
更多
查看译文
关键词
web scraping, extracting, retrieving, Python framework, API, manually collecting data
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要