The Isabelle Community Benchmark.

Workshop on Practical Aspects of Automated Reasoning (PAAR)(2022)

引用 0|浏览3
暂无评分
摘要
Choosing hardware for theorem proving is no simple task: automated provers are highly complex and optimized programs, often utilizing a parallel computation model, and there is little prior research on the hardware impact on prover performance. To alleviate the problem for Isabelle, we initiated a community benchmark where the build time of HOL-Analysis is measured. On $54$ distinct CPUs, a total of $669$ runs with different Isabelle configurations were reported by Isabelle users. Results range from $107$s to over $11$h. We found that current consumer CPUs performed best, with an optimal number of $8$ to $16$ threads, largely independent of heap memory. As for hardware parameters, CPU base clock affected multi-threaded execution most with a linear correlation of $0.37$, whereas boost frequency was the most influential parameter for single-threaded runs (correlation coefficient $0.55$); cache size played no significant role. When comparing our benchmark scores with popular high-performance computing benchmarks, we found a strong linear relationship with Dolfyn ($R^2 = 0.79$) in the single-threaded scenario. Using data from the 3DMark CPU Profile consumer benchmark, we created a linear model for optimal (multi-threaded) Isabelle performance. When validating, the model has an average $R^2$-score of $0.87$; the mean absolute error in the final model corresponds to a wall-clock time of $46.6$s. With a dataset of true median values for the 3DMark, the error improves to $37.1$s.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要