Shaping Datasets: Optimal Data Selection For Specific Target Distributions Across Dimensions

2016 IEEE International Conference on Image Processing (ICIP)(2016)

引用 8|浏览19
暂无评分
摘要
This paper presents a method for dataset manipulation based on Mixed Integer Linear Programming (MILP). The proposed optimization can narrow down a dataset to a particular size, while enforcing specific distributions across different dimensions. It essentially leverages the redundancies of an initial dataset in order to generate more compact versions of it, with a specific target distribution across each dimension. If the desired target distribution is uniform, then the effect is balancing: all values across all different dimensions are equally represented. Other types of target distributions can also be specified, depending on the nature of the problem. The proposed approach may be used in machine learning, for shaping training and testing datasets, or in crowdsourcing, for preparing datasets of a manageable size.
更多
查看译文
关键词
Mixed Integer Linear Programming (MILP),datasets,balancing,crowdsourcing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要