Application of Big Data Analytics and Machine Learning to Large-Scale Synchrophasor Datasets: Evaluation of Dataset ‘Machine Learning-Readiness’

IEEE Open Access Journal of Power and Energy(2022)

Cited 2|Views18
No score
Abstract
This manuscript presents a data quality analysis and holistic ‘machine learning-readiness’ evaluation of a representative set of large-scale, real-world phasor measurement unit (PMU) datasets provided under the United States Department of Energy-funded FOA 1861 research program. A major focus of this study is to understand the present-day suitability of large-scale, real-world synchrophasor datasets for application of commercially-available, off-the-shelf big data and supervised or semi-supervised machine learning (ML) tools and catalogue any major obstacles to their application. To this end, dataset quality is methodically examined through an interconnect-wide quantifications of basic bad data occurrences, a summary of several harder-to-detect data quality issues that can jeopardize successful application of machine learning, and an evaluation of the adequacy of event log labeling for supervised training of models used for online event classification. A global ‘six-point’ statistical analyses of several key dataset variables is demonstrated as a means by which to identify additional hard-to-detect data quality issues, also providing an example successful application of big data technology to extract insights regarding reasonable operational bounds of the US power system. Obstacles for application of commercial ML technologies are summarized, with a particular focus on supervised and semi-supervised ML. Lessons-learned are provided regarding challenges associated with present-day event labeling practices, large spatial scope of the dataset, and dataset anonymization. Finally, insight into efficacy of employed mitigation strategies are discussed, and recommendations for future work are made.
More
Translated text
Key words
Phasor measurement units,synchrophasor datasets,statistical analysis,data quality analysis,label quality,nonymized datasets,supervised machine learning,big data,wide area monitoring system
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined