Chrome Extension
WeChat Mini Program
Use on ChatGLM

A systematic view of information-based optimal subdata selection: algorithm development, performance evaluation, and application in financial data

Li He,William Li, Difan Song,Min Yang

STATISTICA SINICA(2024)

Cited 0|Views1
No score
Abstract
The need to analyze large amounts of data without losing information is evidenced by the recent increase in attention for the information -based optimal subdata selection (IBOSS) approach. However, there are no systematic explorations of this framework, including characterizing the optimal subset when the model is more complex than first -order linear models. Motivated by a real finance case study on the effect of corporate attributes on firm value, we systematically explore the framework and steps required to use IBOSS for data reduction. In the context of second -order models, we develop a novel algorithm for selecting informative subdata. We also evaluate the performance of the proposed algorithm in terms of prediction and variable selection, the latter of which is important for complex models, but has not received sufficient attention in the IBOSS field. Empirical studies demonstrate that the proposed algorithm adequately addresses the trade-off between computation complexity and statistical efficiency, one of six core research directions for theoretical data science research proposed by the US National Science Foundation. The case study demonstrates the potential effect of the IBOSS strategy in scientific fields beyond statistics, particularly finance.
More
Translated text
Key words
Algorithm,computation complexity,IBOSS,statistical efficiency
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined