Interpolation of non-random missing values in financial statements’ big data using CatBoost

JOURNAL OF COMPUTATIONAL SOCIAL SCIENCE(2022)

Cited 2|Views1
No score
Abstract
Financial statements’ big data have the characteristics of “Incompleteness” and “Nonrepresentative”. In this paper, employing the world’s largest commercial database on finance, ORBIS, we first find that the rate of missing data varies depending on the country, the type and size of financial items, and the year. Using information on missing data, we interpolate non-random missing financial variables from the previous- and/or next-year values of the same financial item, the values of other financial items, and the conditions of missing values determined by CatBoost. Because the distribution of financial values obeys Zipf’s law in the large-scale range and mean and variance diverge, we employ an inverse hyperbolic function to convert the value of a financial item as a target variable. We introduce two types of missing interpolation models according to the two types of situations involving missing objective variables. After verifying the accuracies and stabilities of these models, we describe the properties of firm-scale variables in which non-random missing values are interpolated. In the final stage of this work, we combine these two models. From our observations, we confirm that the range in which Zipf’s law is established becomes wider than before interpolation.
More
Translated text
Key words
Interpolation, Non-random missing, CatBoost, Big data, Firm financials, Machine learning
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined