Do low-stakes exams yield accurate results about student achievement?

Dive Brief:

A new, as yet non-peer-reviewed study recently released by the National Bureau for Economic Research indicates that a fairly high number of students worldwide are not putting forth their best efforts on the low-stakes PISA test, causing the rankings in some nations to be inaccurately reflected, Education Week reports.
The authors of the study mined keystroke data to determine if students were taking time to read questions thoroughly, if they quit early, and if they were avoiding difficult or long questions as a way to measure the “seriousness” of the students and how that was affecting test outcomes, which are used by many policymakers to make key educational and economic decisions.
Students who were more likely to “goof off” because the low-stakes test had no impact on them personally included wealthier students, lower-skilled students and students from countries where students were required to sit for high-stakes exams. The lowest percentage of “non-serious students” came from Korea (14%) and the highest from Brazil (67%), while the United States fell between at 23%.

Dive Insight:

In the war between the value of high-stakes and low-stakes testing, this study reveals the impact student attitudes have on test results. Considering the influence many test results have on policy and fiscal decisions, the impact of that metric is concerning. The bias is large, according to the summary of the new study, entitled “Taking PISA Seriously: How Accurate are Low Stakes Exams?"

“A country can rise up to 15 places in rankings if its students took the exam seriously. We ask where the bias is coming from and show that around half of it comes from the proportion of non-serious students, while 36% comes from their ability, with the remaining coming from the extent of non-seriousness,” the summary notes.

While high-stakes testing has received a bad reputation for the pressure it places on a snapshot in time, low-stakes testing — testing that doesn't impact the student in any way — seems to cause some students to be apathetic about the results, thus skewing the data. For some tests, like the NAEP, which is used to gather date from across the United States, the low-stakes nature of the test discourages participation all together. However, in that case, the students who are willing to participate seem more engaged in the test. High-stakes testing does have value in identifying the need for resources, when properly used, and low-stakes testing can have value in aiding student retention of material. But balancing the two aspects is a tricky proposition.

As noted in a 2008 study, “Recognizing that test performance is a function of both knowledge and motivation, the possibility of low student motivation raises the concern of whether data collected are a valid measure of student achievement…. According to Erwin and Wise (2002), 'The challenge to motivate our students to give their best effort when there are few or no personal consequences is probably the most vexing assessment problem we face.’”