TESTING MEMO 7: ASSIGNING LETTER GRADES TO TEST SCORES
by
Larry J. Weber
Virginia Polytechnic Institute and State
University
In TESTING MEMO 6 we recommended recording and
averaging T-scores to determine final class rankings.
Admittedly, this approach may not be practical in
every case, especially if the T-scores would have to
be computed manually. At the same time, we discussed
a difficulty inherent in averaging number-right or
percent-right scores, namely, that differences in score
variability from one test to another result in loss of
control over the influence of each test on the average
score.
At this point, another approach needs to be considered,
namely, recording letter grades for each test and
assignment and averaging these (A=4, B=3, etc.), possibly
after weighting some grades more heavily than others.
One variation of this approach is an acceptable substitute
for the use of T-scores. That is to apply a single,
predetermined distribution of letter grades to each test.
For example, one might decide to assign about 10% As, 30%
Bs, 45% Cs, 10% Ds, and 5% Fs on every test or assignment.
As was the case with use of T-scores, this practice tends
to assure that each test influences the final course grade
in the intended manner.
In the space allowed for this memo, it would be impossible
to answer all questions readers might have about this method
or to qualify its use sufficiently to prevent every
conceivable misapplication. However, some problems or
questions may be anticipated. First, there is no hard-and-
fast rule for predetermining letter grade distributions.
Certainly, it is not necessary to have an equal number of
Bs and Ds, and no one "has to fail."
Distributions may be determined empirically by examining
grade distributions for similar courses from past years.
However, current circumstances may modify them for a given
class. Second, applying this method to small classes must be
done with great care. Personal knowledge of the ability
and achievement of individual students may warrant overriding
the application of a predetermined distribution.
The approach just described is often a good one for
larger classes of know composition, but there are obviously
situations in which it would not be appropriate. In such
cases (when it has been decided to record letter grades), there
is really no way to avoid the problem of potential loss of
control over the influence of each test on the composite grade.
Nevertheless, testing and determination of course grades must
occur in these cases. Therefore, in what follows we offer
some convential wisdom applicable to assignment of letter grades
to tests for later averaging.
A sure way to increase student anxiety is to announce that
you grade your tests on "the curve," for in the eyes of some
students this is tantamount to announcing in advance of the
test that a certain percentage of students will fail.
Alternatively, students, and even instructors, tend to feel
more secure if the criteria for letter-grade assignment
are announced in advance. However, despite the greater
popularity of the latter, it is difficult to recommend
either practice.
The notion of grading on the curve no doubt grew out of the
fact that when large numbers of examinees are administered
lengthy tests, the frequency distribution of the resulting
scores typically tends toward the shape of the normal
or "bell-shaped" curve. The mathematical formula for the for
the normal curve is such that the curve is symmetric about the
mean and almost the entire area under the curve is contained
within three standard deviations above and below the mean,
with roughly 68% of this area within one standard deviation
above and below the mean. If the total area is translated into
the total number of examinees, it is seen that "grading on
the curve" seems to suggest that most examinees would be
awarded grades of C and relatively few examinees would receive
As or Fs. Because of the symmetry of the distribution, it
is suggested that the number of As should equal the number of
Fs and that the number of Bs should equal the number of Ds.
However, the degree to which the frequency distribution
of scores on classroom tests approximate the normal distribution
is probably not very great except for large classes with
lengthy tests of appropriate difficulty (see TESTING MEMO
2 regarding difficulty). But even in these cases, there is
no reason to adopt points on the normal distribution, defined,
a priori, by standard deviation units, as the basis for
establishing the cutting points between letter grades.
A far more reasonable approach would be to examine the
frequency distribution of the scores and then capitalize
on naturally occurring gaps in the score distribution by
setting the cutting points between letter grades at the mid-
point of the gaps. Although this practice may result in
awarding a few more letter grades at a particular level
than you may have intended, it will help to minimize
student quibbling over one or two points which might otherwise
make the difference between one letter grade and another.
The number of natural groupings may also suggest that there
are fewer distinct levels of performance than suggested by
the traditional five letter grades. In this case, you may
decide to award no Fs or perhaps no As, or, if there is a large
gap in between, you may elect not to award any Bs or Ds.
The above suggestions may appear subjective or even capricious
but, unfortunately, there is no objective procedure that can
be counted upon to establish letter grade criteria an advance
of the test. Ultimately, the assignment of letter grades is
a professional judgement that must be rendered on the basis
of fallible test scores.
Ironically, students seem to gain a false sense of security
if the criteria for letter grades on a test are announced
in advance. Typically, this announcement specifies the
percent-correct score associated with each letter grade.
Unfortunately, this requires knowledge in advance as to how
easy or difficult the test will be for a particular group.
Unless you maintain an item bank containing information as to
how difficult each item was when administered previously, it
is often the case that the test turns out to be easier or
harder than anticipated, sometimes greatly so. If it turns out
to be too easy, you will suffer the guilt of grade inflation.
If it turns out to be too difficult, irate students will try
to persuade you to change your a priori grading criteria.
Consequently, it is not recommended to announce the letter
grade criteria until you've had a chance to consider the
score distribution.
One of the most difficult decisions is determining the
cutting point between a minimally acceptable score and a
failing score. If the test contains a reasonable number of
items, you may be able to identify those addressing basic
elements of instruction which you believe should be
answered correctly even by marginal students. A separate
score could be computed based only on these items and
those students who do poorly on these items might be
prime candidates for failing grades. If a multiple-choice
test has been used, a roughly analogous procedure might be
followed whereby you identify, for each question, the single
worst answer and compute a worst answer score for each
examinee. This outcome can be accomplished quite easily
by providing your measurement service a "worst" choice key as
well as the "best" choice key and having the tests processed
twice. You might then combine the resulting two scores, perhaps
by subtracting the "worst" answer score from the "best"
answer score, or you might simply use the "worst" answer scores
independently to help you identify prime candidates for failure.
Another strategy that might be invoked to establish a
minimally acceptable score for a multiple-choice test is to
take advantage of the standard error of measurement, which
is routinely provided by test scoring offices. If the test
is appropriately difficult, it might be reasonable to set
the minimally acceptable score at a point which is
significantly higher than the score which would be expected on
the basis of random guessing alone. For example, suppose the
mean score on a 40 item multiple-choice test composed
of four-choice items was 25 correct with a standard error
of measurement equal to 3.0. In this case, you might want to
set the minimum passing score at 16, which is two standard
errors above the expected chance level score. Though this
practice minimizes the possibility of someone passing the test
who is totally uninformed, you may wish to set the passing
score somewhat higher in light of other considerations.
This recommendation is based on the assumption that the test was
of appropriate difficulty for maximizing the discrimination
among scores with the average score mid-way between the
chance level and a perfect score. (See TESTING MEMO 2.)
Actually, the standard error of measurement is the
standard deviation of the scores that an exminee might obtain
with repeated testing under the assumption that this
repeated testing had no effect on learning. Therefore, if on
the test described in the preceding paragraph is such
that you believe 20 should be the minimum passing score,
a reasonable actual minimum passing score might be 17, one
standard error below. This would allow for the fact that
someone with an average score of 20 (over hypothetical
repeated testings) would score below 17 about 16% of the time.
For more information, contact Bob Frary at
Robert B. Frary, Director of Measurement
and Research Services
Office of Measurement and Research Services
2096 Derring Hall
Virginia Polytechnic Institute and State
University
Blacksburg, VA 24060
703/231-5413 (voice)
frary#064;vtvm1.cc.vt.edu
###