Asymptotically Optimal Regret for Black-Box Predict-then-Optimize
arxiv(2024)
Abstract
We consider the predict-then-optimize paradigm for decision-making in which a
practitioner (1) trains a supervised learning model on historical data of
decisions, contexts, and rewards, and then (2) uses the resulting model to make
future binary decisions for new contexts by finding the decision that maximizes
the model's predicted reward. This approach is common in industry. Past
analysis assumes that rewards are observed for all actions for all historical
contexts, which is possible only in problems with special structure. Motivated
by problems from ads targeting and recommender systems, we study new black-box
predict-then-optimize problems that lack this special structure and where we
only observe the reward from the action taken. We present a novel loss
function, which we call Empirical Soft Regret (ESR), designed to significantly
improve reward when used in training compared to classical accuracy-based
metrics like mean-squared error. This loss function targets the regret achieved
when taking a suboptimal decision; because the regret is generally not
differentiable, we propose a differentiable "soft" regret term that allows the
use of neural networks and other flexible machine learning models dependent on
gradient-based training. In the particular case of paired data, we show
theoretically that optimizing our loss function yields asymptotically optimal
regret within the class of supervised learning models. We also show our
approach significantly outperforms state-of-the-art algorithms on real-world
decision-making problems in news recommendation and personalized healthcare
compared to benchmark methods from contextual bandits and conditional average
treatment effect estimation.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined