Challenges in Variable Importance Ranking Under Correlation
CoRR(2024)
摘要
Variable importance plays a pivotal role in interpretable machine learning as
it helps measure the impact of factors on the output of the prediction model.
Model agnostic methods based on the generation of "null" features via
permutation (or related approaches) can be applied. Such analysis is often
utilized in pharmaceutical applications due to its ability to interpret
black-box models, including tree-based ensembles. A major challenge and
significant confounder in variable importance estimation however is the
presence of between-feature correlation. Recently, several adjustments to
marginal permutation utilizing feature knockoffs were proposed to address this
issue, such as the variable importance measure known as conditional predictive
impact (CPI). Assessment and evaluation of such approaches is the focus of our
work. We first present a comprehensive simulation study investigating the
impact of feature correlation on the assessment of variable importance. We then
theoretically prove the limitation that highly correlated features pose for the
CPI through the knockoff construction. While we expect that there is always no
correlation between knockoff variables and its corresponding predictor
variables, we prove that the correlation increases linearly beyond a certain
correlation threshold between the predictor variables. Our findings emphasize
the absence of free lunch when dealing with high feature correlation, as well
as the necessity of understanding the utility and limitations behind methods in
variable importance estimation.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要