Distributions of Marks

“Quis custodiet ipsos custodes? Your Grace.”
“I know that one,” said Vimes. “Who watches the watchmen? Me, Mr. Pessimal.”
“Ah, but who watches you, Your Grace?” said the inspector with a brief little smile.
“I do that, too. All the time,” said Vimes. “Believe me.”
– Terry Pratchett, Thud!

At this time of year I tend to think about, and be concerned about, student grades; how we combine them, how we check to make sure that we are doing it correctly, and how we know what is in and out of line.

I remember reading McLachlan and Whiten (2000) a number of years ago who, from a medical background, seem to have a different underlying idea about combining and scaling marks. Theirs’s is that the assessment marks are expected to vary between types of assessments, but from this a grade can be calculated from the distribution for each assessment using the median, interquartile range, and knowledge of the assessment. Then the grades can be combined for an average.

I, however, come from a discipline that has suggested that each assessment should be designed so that the average of each assessment should be the same. Though I personally prefer a competency based approach.

From this, the question then becomes, what should this average value be and how do we know if an assessment doesn’t agree with this?

What is an ideal distribution of marks?

This is actually a very complex question – one I won’t provide an answer to, as I don’t think there is one, but will just provide some thoughts on.

At the end of a degree (or something else) a number of modules/assessments are combined to given an overall mark – which is then often converted to a grade/classification. These vary depending on where we are and what degree. I am most familiar with 70% being a 1st class degree, 60-70% being a upper 2nd class degree, 50-60% being a lower 2nd class degree, and 40-50% being a third class degree (with some minor variants for making sure certain number of credits have been passed as the first attempt).

We then need to think about how many people we want with each classification of degree. This has been a topic of much discussion as it is related to grade inflation, for example Bachan (2017). It is well known that the percentage of 1st class degrees as awarded degrees is increasing as it the percentage of good degrees (1st and 2.1). From Figure 1 we can see that if a skew normal distribution is used to estimate the degree classifications data from the Higher Education Statistics Agency, then the mean and standard deviation increase with year. Interestingly up to about 2011 distribution seems to be almost perfectly normal, whereas from then onwards the there seems to be increasing negative skew on the distribution (and thus the fits are not as good). These three changes result in the percentage of 1st class degrees increasing faster than the average. Notable exceptions to the trends are 2020 to 2022 due to many UK universities having differences in assessment in these years.

Figure 1. Variation of degree classifications for students awarded a degree (left graph) and the variation of the mean and standard deviation of a skew normal distribution fitted to UK degree awards distributions (right graph) with data taken from the Higher Education Statistics Agency. The dotted lines are just to guide the eye and ignore 2020-2022 values.

Let us therefore assume that the ideal distribution for an assessment should be a normal distribution. This would mean that the average of a number of these assessments would also be a normal distribution (assuming that the class is large enough) – generally matching the degree distributions shown above (excluding the slight skew). The mean of this normal distribution is the average of the means of each individual distribution; however, the variance is more complex. If we just take our ideal assessment distribution and say that each assessment has the same mean and variance, then the variance of the average of a number of normal distributions is (Mathematicians feel free to correct me on this!):

\text{Var}\left(\overline{X}\right)=n\frac{1}{n^2}\text{Var}\left(X\right)+\frac{n(n-1)}{2}\frac{2}{n^2}\text{Covar}\left(X_i , X_j\right)

If the assessments are completely independent (i.e. the student performance in one assessment is unrelated to that in the others) then $\text{Covar}\left(X_i , X_j\right)=0$ , and if the assessments are completely ordered (i.e. the students perform identical in every assessment) then $\text{Covar}\left(X_i , X_j\right) = \text{Var}\left(X\right)$ . The reality will be somewhere in the middle as in the Figure 2.

Figure 2. Percent of degree classifications predicted for 12 assessments from the distribution for an individual assessment for different covariances between the assessments. The ordered case is identical to the distribution of the individual assessment.

This means that generally the variance of the average mark reduces from the individual assessment mark, and this is highly dependent on the covariance between the assessments, i.e. how consistent students perform.

This means that without the ability to predict the covariance between assessments it is not possible to predict the overall student performance. We have to rely on historic trends to look at the link between individual assessments and the overall distribution, which can be influenced by many factors.

Summary

Regardless of what distribution of marks is the best (which is actually arbitrary), the importance is that there is some way to differentiate performance. For example, about 7% of students got a 1st class degree in 1995 while this was about 30% in 2023 (about 7% would be estimated to have a mark above 77%). Though from the equation above, assuming the typical covariance of about 40 between each assessment, then a mean of 60% and a standard deviation of 12 for an individual assessment would give about 7% of first class degrees over a whole degree.

So what percentage of students should get this classification – maybe it doesn’t matter so long as we are confident that a mark of 70% is always the same level of quality. The problem is this is the difficult bit, because we would be naive to just say that a 77% in 2023 is equivalent to 70% in 1995 as the quality of teaching and learning should have improved so students can get better marks – but I doubt it has improved as much as you would expect from the data.

We need to determine what is expected of a 1st class, 2.1, 2.2, and 3rd class student, then we can design our assessment to allocate the correct amount of marks and work to each level. This would mean we are confident that when a student gets 70% they are “1st class” regardless of how many students achieve this classification. This needs to be agreed at a (inter)national subject level rather than at a university level.

References

John C McLachlan, Susan C Whiten; 2000; “Marks, scores and grades: scaling and aggregating student assessment outcomes” Medical Education, 34:781-881.
Ray Bachan; 2017; “Grade inflation in UK higher education”, Studies in Higher Education, 42:1580-1600.

Tom Rodgers

Teaching and Teaching Research

Blog