Marking, grading and assessment in higher education

It is disturbing to find that many people in the higher echelons of education do not seem to understand some basic aspects of marking, grading and formative versus summative assessment.

Firstly, we assess the quality of the artifact or performance, not the inherent quality of the student themselves. We are identifying that the assignment is a credit level assignment, not that the student is a credit level person, or has credit level intelligence.

Secondly, when we mark an assessed piece of work, we might begin with 100% and deduct marks for each “error”, or we might start at 0 and add marks for each relevant point. However when we grade a piece of work, we are using an ordinal scale, not an equal interval scale. If we inherently use a grading system, we need to understand the inherent qualitative properties of such a scale, and the fact that it is ordinal at best in quantitative terms. I say “at best” because within the non-passing grade category, there are a number of different ways to fail that are not all equal, and I would probably argue that a 45 – 50 based on a genuine attempt at all assessments is lower in rank than a half-baked attempt on some assessments in the 30-40 range.

Thirdly, even when we use so-called objective assessments such as multi-choice tests, not all questions are created equal. There are many methods by which we could scale/rescale scores on MCQs including providing different weightings for questions of different levels of difficulty, or deducting points for incorrect answers. These methods may be much more robust than scoring each question as one point, but would most likely be very unpopular with students, and would actually be difficult to implement at a conceptual level (how do you actually rate the difficulty of each question – by performance, by expert-ranking, by whether it differentiates between “good” and “bad” students (but how do we know who these people are given that we are using the test to assess this?).

While I have always been a proponent of grading rather than marking, I have become accustomed to using marking pro-formas that define marking criteria and their weightings. I note that these rubrics tend to use grading of sub-sections of an artifact to generate a numerical score that is then combined with other scores from the rubric weighted according to the pre-defined set of weighting criteria. This allows for an acceptable spread of grades for each artifact and is apparently “more objective” than grading the artifact itself holistically. From where I sit, it is really a way of ensuring that there is a spread of marks that we can then post-hoc turn into a distribution of grades without further reference to the actual artifacts being graded.

So far as I am concerned, when we have done this, we have actually abrogated our professional responsibility to our disciplines, and become part of a credentialling factory that is obsessed with quantification of performance outcomes under the umbrella of “quality assurance” -while being oblivious to the notion that quality and quantity are inherently different constructs.

Leave a Reply Cancel reply