Chrome Extension
WeChat Mini Program
Use on ChatGLM

Automating Entity Matching Model Development

2021 IEEE 37TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2021)(2021)

Cited 6|Views16
No score
Abstract
This paper seeks to answer one important but unexplored question for Entity Matching (EM): can we develop a good machine learning pipeline automatically for the EM task? If yes, to what extent the process can be automated? To answer this question, we find that a general-purpose AutoML tool cannot be directly applied to solve an EM problem, thus propose AutoML-EM, an automated model pipeline development solution tailored for EM. In reality, however, another bottleneck of EM problem is the insufficient labeled data. To mitigate this issue, active learning based solutions are widely adopted. Under this setting, we propose AutoML-EM-Active, investigating how to maximize the benefit of AutoML-EM with automatic data labeling. We provide fundamental insights into our solutions and conduct extensive experiments to examine their performance on benchmark datasets. The results suggest that AutoML-EM not only avoids human involvement in model development process but also reaches or exceeds the state-of-the-art EM performance, and AutoML-EM-Active improves the model performance under the active learning setting effectively.
More
Translated text
Key words
Entity Matching, AutoML, Data Integration, Active Learning
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined