[IPAC-List] Interpretation of internal consistencyreliabilitycoefficients

Patrick McCoy Patrick.McCoy at psc-cfp.gc.ca
Wed Feb 3 13:50:51 EST 2010

If I am not mistaken, a test can have high internal consistency, under
some circumstances, even if it is tapping more than one construct.

My guess is that for this to take place, there needs to be enough items
related to each construct and the answers need to be sound.

Pat McCoy
Ottawa, Canada

>>> "Dennis Doverspike " <dd1 at uakron.edu> 2010/02/03 12:16 pm >>>

Abstract. You have here the classic conflict in any type of test of
multidimensional domains or a test based on a criterion oriented
However, at the same time you want to calculate a total score. Bottom
it is impossible to say that a .3, .6 or .9 would be preferable
knowing your exact purpose and the likely correlations between those
domains. A .3 could be very good in some situations. Although then a
might argue - why calculate a total score? Why not calculate a score
each dimension separately?

More detail

First, internal consistency reliability is based on one particular
theory of
reliability (or several theories of reliability really) coming from a
particular view of tests, which sees the best tests as measuring a
unidimensional latent trait.

The general theory of situational judgment tests (see especially Mike
McDaniel's work) is not compatible with classic notions of internal
consistency. In other words, internal consistency is not an
index for situational judgment tests.

Having said that, playing devil's advocate of my own position, it could
argued that situational judgment tests still arrive at a single total
which suggests that there is a single something being measured.

On your point 2, you could have high test retest reliability with low
internal consistency. It is possible, but again would depend on your
and your theory. Technically, under some reliability theories, test
is not an acceptable type of reliability at all -- since by definition
is no random sampling of items (we can retain the fiction as long as we
not blatantly violate it).

Bottom line then - a .3 internal consistency might be very appropriate
in fact good for a situational judgment test, especially depending upon
theory of construction. Remember, in developing a test based on a pure
criterion oriented approach, such as might be seen with classical BIBs
even situational judgment tests, we would want a correlation between
additional item of 0, since that results in the greatest Multiple R.
in that case, lower internal consistencies are better and a high
consistency would be bad.

So you have competing ideas here. You are calculating I would guess a
score, but you are calculating a total score based on adding together
measures of independent constructs. So a .3 might be good, a .6 might
good, a .9 might be good, depends on what you are trying to achieve and
the correlation really is between those independent constructs.

One could go into a lot more detail and argument on these concepts, but
is for a different forum.

Dennis Doverspike, Ph.D., ABPP
Professor of Psychology
Director, Center for Organizational Research
Senior Fellow of the Institute for Life-Span Development and
Psychology Department
University of Akron
Akron, Ohio 44325-4301
330-972-8372 (Office)
330-972-5174 (Office Fax)
ddoverspike at uakron.edu

The information is intended only for the person or entity to which it
addressed and may contain confidential, privileged and/or a work
product for
the sole use of the intended recipient. No confidentiality or
privilege is
waived or lost by any errant transmission. If you receive this message
error, please destroy all copies of it and notify the sender. If the
of this message is not the intended recipient, you are hereby notified
any dissemination, distribution or copying of this communication is
prohibited. In the case of E-mail or electronic transmission,
delete it and all copies of it from your system and notify the sender.
E-mail and fax transmission cannot be guaranteed to be secure or
as information could be intercepted, corrupted, lost, destroyed, arrive
or incomplete, or contain viruses.

-----Original Message-----
From: ipac-list-bounces at ipacweb.org
[mailto:ipac-list-bounces at ipacweb.org]
On Behalf Of Shekerjian, Rene
Sent: Wednesday, February 03, 2010 11:36 AM
To: ipac-list at ipacweb.org
Subject: [IPAC-List] Interpretation of internal consistency

I have been reading up on internal consistency reliability
(e.g., KR-20 and Cronbach's Alpha) in order to clarify my thinking
about it,
but I am having trouble finding much of practical use beyond two basic

(1) .6 is tolerable, and .9 is the gold standard

(2) you can have high test-retest reliability with low internal

My question is this:

Suppose you have around 20 items that make up a situational judgment
subtest. The domain is defined by job analysis and is supposed to
address a
competency that entails behaviors such as interacting with customers,
solving problems, giving advice, assessing situations, and determining
best action to take, all within a circumscribed context.

How would you interpret an internal consistency reliability coefficient
.3 for such a test? How about .6? And what about .9?

My stab at this is that .3 suggests several unwanted possibilities:
them are (1) the items were "good" but too difficult for the candidates
(2) the items have flaws such as not fully defining the circumstances
constraints that need to be taken into account to arrive at the

Personally, I would expect a well crafted set of such items that are
to an appropriate candidate group to hold together and have an
consistency reliability coefficient around .6

As far as getting a .9 for this sort of test, I think that that would
indicate too narrow a focus for a domain that would be expected to
cover a
pretty wide territory.

Because while it makes sense that people who are more able and
motivated to
develop expertise in a "broad" competency will tend to be good at much
of it
and those who are less able and/or motivated will tend to perform
poorly in
much of it, I would expect some randomization of people's strengths
weaknesses, which would lead some people to perform well in many areas
still fall down in a few (but with no discernible pattern) and others
perform poorly in many but still be strong in a few (again with no
discernible pattern).

Your thoughts on this would be much appreciated.

Thanks in advance,


René Shekerjian | Testing Services Division | NYS Department of
Service | 518-473-9937

IPAC-List at ipacweb.org

IPAC-List at ipacweb.org

More information about the IPAC-List mailing list