Chrome Extension
WeChat Mini Program
Use on ChatGLM

On the Energy Consumption of Different Dataframe Processing Libraries -- An Exploratory Study

CoRR(2022)

Cited 0|Views0
No score
Abstract
Background: The energy consumption of machine learning and its impact on the environment has made energy efficient ML an emerging area of research. However, most of the attention stays focused on the model creation and the training and inferencing phase. Data oriented stages like preprocessing, cleaning and exploratory analysis form a critical part of the machine learning workflow. However, the energy efficiency of these stages have gained little attention from the researchers. Aim: Our study aims to explore the energy consumption of different dataframe processing libraries as a first step towards studying the energy efficiency of the data oriented stages of the machine learning pipeline. Method: We measure the energy consumption of 3 popular libraries used to work with dataframes, namely Pandas, Vaex and Dask for 21 different operations grouped under 4 categories on 2 datasets. Results: The results of our analysis show that for a given dataframe processing operation, the choice of library can indeed influence the energy consumption with some libraries consuming 202 times lesser energy over others. Conclusion: The results of our study indicates that there is a potential for optimizing the energy consumption of the data oriented stages of the machine learning pipeline and further research is needed in the direction.
More
Translated text
Key words
different dataframe processing libraries,energy consumption
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined