Proposal of a general framework to categorize continuous predictor variables
arxiv(2024)
Abstract
The use of discretized variables in the development of prediction models is a
common practice, in part because the decision-making process is more natural
when it is based on rules created from segmented models. Although this practice
is perhaps more common in medicine, it is extensible to any area of knowledge
where a predictive model helps in decision-making. Therefore, providing
researchers with a useful and valid categorization method could be a relevant
issue when developing prediction models. In this paper, we propose a new
general methodology that can be applied to categorize a predictor variable in
any regression model where the response variable belongs to the exponential
family distribution. Furthermore, it can be applied in any multivariate
context, allowing to categorize more than one continuous covariate
simultaneously. In addition, a computationally very efficient method is
proposed to obtain the optimal number of categories, based on a pseudo-BIC
proposal. Several simulation studies have been conducted in which the
efficiency of the method with respect to both the location and the number of
estimated cut-off points is shown. Finally, the categorization proposal has
been applied to a real data set of 543 patients with chronic obstructive
pulmonary disease from Galdakao Hospital's five outpatient respiratory clinics,
who were followed up for 10 years. We applied the proposed methodology to
jointly categorize the continuous variables six-minute walking test and forced
expiratory volume in one second in a multiple Poisson generalized additive
model for the response variable rate of the number of hospital admissions by
years of follow-up. The location and number of cut-off points obtained were
clinically validated as being in line with the categorizations used in the
literature.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined