The Inefficiency of Genetic Programming for Symbolic Regression – Extended Version
CoRR(2024)
Abstract
We analyse the search behaviour of genetic programming for symbolic
regression in practically relevant but limited settings, allowing exhaustive
enumeration of all solutions. This enables us to quantify the success
probability of finding the best possible expressions, and to compare the search
efficiency of genetic programming to random search in the space of semantically
unique expressions. This analysis is made possible by improved algorithms for
equality saturation, which we use to improve the Exhaustive Symbolic Regression
algorithm; this produces the set of semantically unique expression structures,
orders of magnitude smaller than the full symbolic regression search space. We
compare the efficiency of random search in the set of unique expressions and
genetic programming. For our experiments we use two real-world datasets where
symbolic regression has been used to produce well-fitting univariate
expressions: the Nikuradse dataset of flow in rough pipes and the Radial
Acceleration Relation of galaxy dynamics. The results show that genetic
programming in such limited settings explores only a small fraction of all
unique expressions, and evaluates expressions repeatedly that are congruent to
already visited expressions.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined