Chrome Extension
WeChat Mini Program
Use on ChatGLM

Integrated Online Learning and Adaptive Control in Queueing Systems with Uncertain Payoffs

Operations Research(2022)

Cited 5|Views4
No score
Abstract
Many online service platforms have dedicated algorithms to match their available resources to incoming clients to maximize client satisfaction. One of the key challenges is to balance the generation of higher payoffs from existing clients and exploration of new clients’ unknown characteristics while at the same time satisfy the resource capacity constraints. In “Integrated Online Learning and Adaptive Control in Queueing Systems with Uncertain Payoffs,” Hsu, Xu, Lin, and Bell show that traditional approaches such as maximizing instantaneous payoffs with current knowledge or using queue-length based controls guided by “shadow prices,” would lead to suboptimal long-term payoffs. Instead, they propose a novel utility-guided assignment algorithm that seamlessly integrates online learning and adaptive control to provide high system payoffs with performance guarantees. The theoretical performance bound also lends system design insights into the impact of uncertain client dynamics, payoff learning, and backlogged clients. They further develop a decentralized version of the algorithm, which is applicable to large systems and performs well even when the service rates are random. We study task assignment in online service platforms, where unlabeled clients arrive according to a stochastic process and each client brings a random number of tasks. As tasks are assigned to servers, they produce client/server-dependent random payoffs. The goal of the system operator is to maximize the expected payoff per unit time subject to the servers’ capacity constraints. However, both the statistics of the dynamic client population and the client-specific payoff vectors are unknown to the operator. Thus, the operator must design task-assignment policies that integrate adaptive control (of the queueing system) with online learning (of the clients’ payoff vectors). A key challenge in such integration is how to account for the nontrivial closed-loop interactions between the queueing process and the learning process, which may significantly degrade system performance. We propose a new utility-guided online learning and task assignment algorithm that seamlessly integrates learning with control to address such difficulty. Our analysis shows that, compared with an oracle that knows all client dynamics and payoff vectors beforehand, the gap of the expected payoff per unit time of our proposed algorithm can be analytically bounded by three terms, which separately capture the impact of the client-dynamic uncertainty, client-server payoff uncertainty, and the loss incurred by backlogged clients in the system. Further, our bound holds for any finite time horizon. Through simulations, we show that our proposed algorithm significantly outperforms a myopic-matching policy and a standard queue-length-based policy that does not explicitly address the closed-loop interactions between queueing and learning.
More
Translated text
Key words
Stochastic Models,online learning,online service platforms,convex optimization,decentralized algorithms
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined