Clustering MOOC Programming Solutions to Diversify Their Presentation to Students
arxiv(2024)
摘要
In many MOOCs, whenever a student completes a programming task, they can see
previous solutions of other students to find potentially different ways of
solving the problem and learn new coding constructs. However, a lot of MOOCs
simply show the most recent solutions, disregarding their diversity or quality.
To solve this novel problem, we adapted the existing plagiarism detection
tool JPlag to Python submissions on Hyperskill, a popular MOOC platform.
However, due to the tool's inner algorithm, it fully processed only 46 out of
867 studied tasks. Therefore, we developed our own tool called Rhubarb. This
tool first standardizes solutions that are algorithmically the same, then
calculates the structure-aware edit distance between them, and then applies
clustering. Finally, it selects one example from each of the largest clusters,
taking into account their code quality. Rhubarb was able to handle all 867
tasks successfully.
We compared approaches on a set of 59 tasks that both tools could process.
Eight experts rated the selected solutions based on diversity, code quality,
and usefulness. The default platform approach of selecting recent submissions
received on average 3.12 out of 5, JPlag - 3.77, Rhubarb - 3.50. Since in the
real MOOC, it is imperative to process everything, we created a system that
uses JPlag on the 5.3
94.7
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要