[IPAC-List] Interpretation of internal consistencyreliabilitycoefficients

GWEN SCHINDLER Gschindler at comp.state.md.us
Thu Feb 4 08:46:42 EST 2010

Previous message: [IPAC-List] Interpretation of internal consistencyreliabilitycoefficients
Next message: [IPAC-List] Interpretation of internal consistency reliability coefficients
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

IPMAAC Folks,

I have struggled with the interpretation of internal consistency
coefficients, in particular KR-20, for several years now. In my
experience, the reliability index is influenced largely by the number of
items in the subtest. So when I run reliability coefficients on my
separate subtests, which are theoretically more homogenous than my whole
test, my KR-20s are always much lower than the whole test KR-20 and
often lower than the recommended indices. This has propelled some
people I have worked with to use the whole test index as the measure of
reliability, which gets me reeling because I believe it's an incorrect
use of the statistic. However, when one is thinking about what looks
good on paper and about defending one's test to a judge and jury of lay
persons, part of me understands the argument. It's exasperating for one
who is really trying to be accurate and stay true to good test
development principles.

Gwen Schindler, Personnel Analyst
Comptroller of Maryland
Office of Personnel Services
Louis L Goldstein Treasury Building
80 Calvert St., Room 209
Annapolis, MD 21404-0466
Phone: 410-260-6622
Fax: 410-974-5249

>>> "Patrick McCoy" <Patrick.McCoy at psc-cfp.gc.ca> 2/3/2010 1:50 PM >>>

If I am not mistaken, a test can have high internal consistency, under
some circumstances, even if it is tapping more than one construct.

My guess is that for this to take place, there needs to be enough
items
related to each construct and the answers need to be sound.

Pat McCoy
Ottawa, Canada

>>> "Dennis Doverspike " <dd1 at uakron.edu> 2010/02/03 12:16 pm >>>

Abstract. You have here the classic conflict in any type of test of
multidimensional domains or a test based on a criterion oriented
strategy.
However, at the same time you want to calculate a total score. Bottom
line,
it is impossible to say that a .3, .6 or .9 would be preferable
without
knowing your exact purpose and the likely correlations between those
domains. A .3 could be very good in some situations. Although then a
critic
might argue - why calculate a total score? Why not calculate a score
for
each dimension separately?

More detail

First, internal consistency reliability is based on one particular
theory of
reliability (or several theories of reliability really) coming from a
particular view of tests, which sees the best tests as measuring a
unidimensional latent trait.

The general theory of situational judgment tests (see especially Mike
McDaniel's work) is not compatible with classic notions of internal
consistency. In other words, internal consistency is not an
appropriate
index for situational judgment tests.

Having said that, playing devil's advocate of my own position, it
could
be
argued that situational judgment tests still arrive at a single total
score,
which suggests that there is a single something being measured.

On your point 2, you could have high test retest reliability with low
internal consistency. It is possible, but again would depend on your
measure
and your theory. Technically, under some reliability theories, test
retest
is not an acceptable type of reliability at all -- since by definition
there
is no random sampling of items (we can retain the fiction as long as
we
do
not blatantly violate it).

Bottom line then - a .3 internal consistency might be very appropriate
and
in fact good for a situational judgment test, especially depending
upon
its
theory of construction. Remember, in developing a test based on a pure
criterion oriented approach, such as might be seen with classical BIBs
or
even situational judgment tests, we would want a correlation between
each
additional item of 0, since that results in the greatest Multiple R.
Thus,
in that case, lower internal consistencies are better and a high
internal
consistency would be bad.

So you have competing ideas here. You are calculating I would guess a
total
score, but you are calculating a total score based on adding togethe
r
measures of independent constructs. So a .3 might be good, a .6 might
be
good, a .9 might be good, depends on what you are trying to achieve
and
what
the correlation really is between those independent constructs.

One could go into a lot more detail and argument on these concepts,
but
that
is for a different forum.

Dennis Doverspike, Ph.D., ABPP
Professor of Psychology
Director, Center for Organizational Research
Senior Fellow of the Institute for Life-Span Development and
Gerontology
Psychology Department
University of Akron
Akron, Ohio 44325-4301
330-972-8372 (Office)
330-972-5174 (Office Fax)
ddoverspike at uakron.edu

The information is intended only for the person or entity to which it
is
addressed and may contain confidential, privileged and/or a work
product for
the sole use of the intended recipient. No confidentiality or
privilege is
waived or lost by any errant transmission. If you receive this message
in
error, please destroy all copies of it and notify the sender. If the
reader
of this message is not the intended recipient, you are hereby notified
that
any dissemination, distribution or copying of this communication is
strictly
prohibited. In the case of E-mail or electronic transmission,
immediately
delete it and all copies of it from your system and notify the sender.
E-mail and fax transmission cannot be guaranteed to be secure or
error-free
as information could be intercepted, corrupted, lost, destroyed,
arrive
late
or incomplete, or contain viruses.

-----Original Message-----
From: ipac-list-bounces at ipacweb.org
[mailto:ipac-list-bounces at ipacweb.org]
On Behalf Of Shekerjian, Rene
Sent: Wednesday, February 03, 2010 11:36 AM
To: ipac-list at ipacweb.org
Subject: [IPAC-List] Interpretation of internal consistency
reliabilitycoefficients

I have been reading up on internal consistency reliability
coefficients
(e.g., KR-20 and Cronbach's Alpha) in order to clarify my thinking
about it,
but I am having trouble finding much of practical use beyond two basic
points:

(1) .6 is tolerable, and .9 is the gold standard

(2) you can have high test-retest reliability with low internal
consistency

My question is this:

Suppose you have around 20 items that make up a situational judgment
subtest. The domain is defined by job analysis and is supposed to
address a
competency that entails behaviors such as interacting with customers,
solving problems, giving advice, assessing situations, and determining
the
best action to take, all within a circumscribed context.

How would you interpret an internal consistency reliability
coefficient
of
.3 for such a test? How about .6? And what about .9?

My stab at this is that .3 suggests several unwanted possibilities:
among
them are (1) the items were "good" but too difficult for the
candidates
and
(2) the items have flaws such as not fully defining the circumstances
and/or
constraints that need to be taken into account to arrive at the
"correct"
answer.

Personally, I would expect a well crafted set of such items that are
given
to an appropriate candidate group to hold together and have an
internal
consistency reliability coefficient around .6

As far as getting a .9 for this sort of test, I think that that would
indicate too narrow a focus for a domain that would be expected to
cover a
pretty wide territory.

Because while it makes sense that people who are more able and
motivated to
develop expertise in a "broad" competency will tend to be good at much
of it
and those who are less able and/or motivated will tend to perform
poorly in
much of it, I would expect some randomization of people's strengths
and
weaknesses, which would lead some people to perform well in many areas
but
still fall down in a few (but with no discernible pattern) and others
to
perform poorly in many but still be strong in a few (again with no
discernible pattern).

Your thoughts on this would be much appreciated.

Thanks in advance,

René

René Shekerjian | Testing
Services Division | NYS Department of
Civil
Service | 518-473-9937
============================================================================
=

_______________________________________________________
IPAC-List
IPAC-List at ipacweb.org
http://www.ipacweb.org/mailman/listinfo/ipac-list

_______________________________________________________
IPAC-List
IPAC-List at ipacweb.org
http://www.ipacweb.org/mailman/listinfo/ipac-list
_______________________________________________________
IPAC-List
IPAC-List at ipacweb.org
http://www.ipacweb.org/mailman/listinfo/ipac-list

------------------------------------------------------------------------------
This email and any file transmitted with it may be confidential and are intended solely for the use of the individual or entity to whom they are addressed. If you received this email in error, please notify the Comptroller's System Manager by forwarding this message to postmaster at comp.state.md.us
==============================================================================

Previous message: [IPAC-List] Interpretation of internal consistencyreliabilitycoefficients
Next message: [IPAC-List] Interpretation of internal consistency reliability coefficients
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the IPAC-List mailing list