谷歌浏览器插件
订阅小程序
在清言上使用

Reassessing the Elephant, Part 1

Assessment Update(2019)

引用 2|浏览1
暂无评分
摘要
In 2008, my first Assessment Update article (Eubanks 2008), “Assessing the General Education Elephant,” described a faculty-centered method of gathering data on general student learning outcomes like thinking and communication. The title references the parable about the blind men and an elephant, employing the idea that multiple perspectives help us understand complex outcomes. In this two-part update, I will review the methods used and summarize findings from the intervening years. Formal assessments like standardized tests or rubric ratings of student work are often used to make general assessments about student abilities. Such extrapolation from a single point in time is not ideal. Measurements over a longer period of time can benefit from error-averaging. For example, high school grades are better predictors of college grades than are standardized tests. Instead of assessments based on very limited evidence, the “assessing the elephant” method relies on two kinds of multiple perspectives. One is that instructors are asked to holistically summarize student work over an entire term, not by just considering a single student work. The other is that the same student is assessed by multiple instructors. This approach trusts the faculty as domain experts and teachers. Holistic ratings of students are commonly used as data in K–12 education and in some areas of higher education, like the arts. In 1991, DuPaul, Rapport, and Perriello wrote of such methods: “Teachers are able to observe student performance on a more comprehensive sample of academic content than could be included on a standardized achievement test. Thus, their judgments provide a more representative sample of the domain of interest in academic assessment.” The assessment office at Furman University administers a universitywide survey at the end of each term that asks the faculty to provide their holistic ratings of their students on predetermined learning outcomes. These include thinking and communication skills as well as discipline-specific skills like computer programming. Faculty members are asked to rate students at the end of each course on a five-point developmental scale, but only if they have a basis for judgment. The rating scale uses a common language to describe student development over a four-year undergraduate degree. The low end of the scale is “the student is not doing college-level work,” and the high end is “the student is ready to graduate.” The reporting is done through an online form that only requires a few minutes to complete. The reports are a natural conclusion to teaching a class—a reflection on the success of each student, judged holistically and in comparison to an ideal four-year career. One faculty member recently told me she asks her students to assess their own abilities at the beginning and the end of the course, and then she compares their responses to her own assessments; she says her students are too hard on themselves. Single-point assessments, such as regrading papers with a rubric, are statistically useful only if there are large sample sizes (Bacon and Stewart 2017). With only 20 or 30 samples, for example, it is barely possible to get a sense of reliability, and no trustworthy validity work can be done on such tiny amounts of data. When regulatory compliance demands frequent assessment reports of every academic program, faculty members may feel pressure to draw conclusions based on statistically insignificant findings. I gave a more detailed version of this argument elsewhere (Eubanks 2017). The “assessing the elephant” survey of student competencies avoids the small sample problem by inviting all course instructors to participate in evaluation of student competencies. At each of the four institutions where I have employed this type of survey, around half of the teaching faculty participated with only modest encouragement to do so, resulting in a great volume of data and multiple perspectives on student achievement. Not only does each rating benefit from “averaging” weeks of observations of students, but there are large enough sample sizes to do interesting statistics. At Furman, we generate about seven ratings per student each term, more than enough to understand how different types of students develop over time. That data contributes to a larger project to build a research platform on student development and achievement. For example, the approximately 14,000 ratings of student writing over four years help us to understand the relationship between high school grades and writing development in college. These items have in common the entailment of having sufficient sample sizes and a diversity of perspectives. Engaging and trusting the faculty to rate students based on their observations meets this need. So why aren't such surveys routine in assessment practice? An answer can perhaps be found in the DuPaul, Rapport, and Perriello (1991) article cited earlier. After describing the usefulness of direct observation, the authors also noted, “At the present time, however, teachers typically are not asked for this information in a systematic fashion, and when available, such input is considered to be highly suspect data.” I have encountered similar prejudices. It is interesting that subjective data are gathered elsewhere in universities without prejudice. We routinely use satisfaction surveys to make decisions, asking respondents to reflect on an academic term or year, or a whole college career. Employee evaluations are not based on a single-blind reviewed work product; no one would accept that as valid. Data-collection methods should be judged by the usefulness of the data collected. Studies in K–12 show good predictive validity of trust-the-faculty methods (Kettler and Albers 2013), and my own experience over 15 years and four institutions has been positive. At the first institution where we used this method, a continuous five-year history of ratings of student writing was good enough to plausibly distinguish the effect of a writing lab intervention over time. By comparison, the parallel rubric rating of student portfolios had such low rater agreement that it was abandoned. Both in absolute quality and quantity of the outcomes, and in comparison to more common methods, the trust-the-faculty “assessing the elephant” method shows its worth. In Part 2 of this article, scheduled for publication in Assessment Update, Volume 31, Number 3, I will describe some of the characteristics and uses of the data collected at Furman University from fall 2015 through fall 2018, comprising more than 130,000 ratings of student learning outcomes. It is my hope that others will try this trust-the-faculty approach for themselves and report back. David Eubanks is the assistant vice president for institutional effectiveness at Furman University in Greenville, South Carolina.
更多
查看译文
关键词
elephant,part
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要