Chrome Extension
WeChat Mini Program
Use on ChatGLM

Revolutionizing Concurrent Crawling: A Novel Approach to Enhance PHP-Python Integration using AMQP, Selenium, Celery, and RabbitMQ

2023 IEEE International Conference on Computing (ICOCO)(2023)

Cited 0|Views3
No score
Abstract
This research proposes a practical solution for seamlessly integrating PHP with Python in web development, focusing on achieving efficient web crawling. The problem is that many PHP applications need to call the Python application for machine learning or crawling work. With default functions from PHP, such as PHP-exec, the Python program could be executed but cannot be maintained smoothly if the program is a lengthy task in the background. By leveraging the AMQP (Advanced Message Queuing Protocol) library and the Selenium Crawler, Celery, and RabbitMQ, we establish interoperability between PHP and Python. In our approach, PHP acts as the front end, initiating web crawling tasks by doing some action in the web component. These requests are queued with a message broker application such as RabbitMQ, the message broker. RabbitMQ connected with Celery for the seamless scheduling and execution of tasks. This research enables effective web crawling and concurrent data scraping by seamlessly integrating the Selenium Crawler with Celery. Results from extracted data from the crawler are saved to the database to give a certain status of whether the data collection process is done or pending. Through experimentation, we validate the effectiveness of our seamlessly integrated approach by making a variation of worker and concurrent connection. The testing scenario shows that increasing workers only increase a small amount of memory. This result indicates that workers could help maintain the response time if there is some user, but need some consideration based on the number of users and availability of memory and CPU.
More
Translated text
Key words
AMQP,Celery,RabbitMQ,Selenium,interoperability
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined