A data‐driven support strategy for a sustainable research software repository

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE(2019)

引用 2|浏览7
暂无评分
摘要
We describe a sustainable strategy to support a large number of researchers with widely varying scientific software needs, which is a common problem for most centralized Research Computing Centers on university campuses. Changes in systems and hardware, coupled with aging software, often necessitates re-compilation of existing software. The naive approach of re-compiling all of the existing packages is not only counterproductive but may also become unrealistic, especially for small support teams such as Georgia Tech's PACE Team. Instead, we analyze job scheduling data to identify actively used software, then rank, and distribute them in three support tiers, which define the level of support we provide. The distribution of software into multiple tiers is a non-trivial problem. We use a heuristic ranking algorithm that uses four metrics, namely the number of users, groups, jobs, and their collective runtimes. The results revealed a surprisingly small subset of software that is sufficient to support a very large portion of the overall research computing activity on campus. This approach allows us to make data-driven strategic technical and policy decisions to provide high-quality support for the software that really matters and sustain these services with a relatively small team in the long term.
更多
查看译文
关键词
compilation,optimization,pareto ranking,repository,research software
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要