Chrome Extension
WeChat Mini Program
Use on ChatGLM

An Exploratory Study on Energy Consumption of Dataframe Processing Libraries.

MSR(2023)

Cited 1|Views3
No score
Abstract
The energy consumption of machine learning applications and their impact on the environment has recently gained attention as a research area, focusing on the model creation and training/inference phases. The data-oriented stages of the machine learning pipeline, which involve pre-processing, cleaning, and exploratory analysis, are critical components. However, energy consumption during these stages has received limited attention. Dataframe processing libraries play a significant role in these stages, and optimizing their energy consumption is important for reducing environmental impact and operational costs. Therefore, as a first step towards studying their energy efficiency, we investigate and compare the energy consumption of three popular dataframe processing libraries, namely Pandas, Vaex, and Dask. We perform experiments across 21 dataframe processing operations within four categories, utilizing three distinct datasets. Our results indicate that no single library is the most energy-efficient for all tasks, and the choice of a library can have a significant impact on energy consumption based on the types and frequencies of operations performed. The findings of this study suggest the potential for optimization of the energy consumption of data-oriented stages in the machine learning pipeline and warrant further research in this area.
More
Translated text
Key words
dataframe, data preprocessing, energy efficiency, machine learning pipeline
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined