[IPAC-List] Interpretation of internal consistency reliabilitycoefficients

Pluta, Paul ppluta at hr.lacounty.gov
Wed Feb 3 12:27:22 EST 2010

Internal consistency only tells you something when you are dealing with a homogeneous scale. I only look at it at the subscale level when reviewing tests measuring multiple constructs. I would not be surprised to find internal consistency of .3 overall for a test that taps multiple constructs because, by its very nature, such a test can be expected to have low internal consistency. If such a test had high internal consistency, I would question the dimensionality of the test. If you believe you are dealing with a homogeneous construct, but have low internal consistency, you may not be dealing with a homogeneous construct after all. SJTs are multi-dimensional assessments and internal consistency is not an appropriate indicator of the test's reliability.

Paul E. Pluta, MA, SPHR
Human Resources Analyst III
Los Angeles County Department of Human Resources
Workforce Planning, Test Research, & Appeals Division

-----Original Message-----
From: ipac-list-bounces at ipacweb.org [mailto:ipac-list-bounces at ipacweb.org] On Behalf Of Shekerjian, Rene
Sent: Wednesday, February 03, 2010 8:36 AM
To: ipac-list at ipacweb.org
Subject: [IPAC-List] Interpretation of internal consistency reliabilitycoefficients

I have been reading up on internal consistency reliability coefficients (e.g., KR-20 and Cronbach's Alpha) in order to clarify my thinking about it, but I am having trouble finding much of practical use beyond two basic points:

(1) .6 is tolerable, and .9 is the gold standard

(2) you can have high test-retest reliability with low internal consistency

My question is this:

Suppose you have around 20 items that make up a situational judgment subtest. The domain is defined by job analysis and is supposed to address a competency that entails behaviors such as interacting with customers, solving problems, giving advice, assessing situations, and determining the best action to take, all within a circumscribed context.

How would you interpret an internal consistency reliability coefficient of .3 for such a test? How about .6? And what about .9?

My stab at this is that .3 suggests several unwanted possibilities: among them are (1) the items were "good" but too difficult for the candidates and (2) the items have flaws such as not fully defining the circumstances and/or constraints that need to be taken into account to arrive at the "correct" answer.

Personally, I would expect a well crafted set of such items that are given to an appropriate candidate group to hold together and have an internal consistency reliability coefficient around .6

As far as getting a .9 for this sort of test, I think that that would indicate too narrow a focus for a domain that would be expected to cover a pretty wide territory.

Because while it makes sense that people who are more able and motivated to develop expertise in a "broad" competency will tend to be good at much of it and those who are less able and/or motivated will tend to perform poorly in much of it, I would expect some randomization of people's strengths and weaknesses, which would lead some people to perform well in many areas but still fall down in a few (but with no discernible pattern) and others to perform poorly in many but still be strong in a few (again with no discernible pattern).

Your thoughts on this would be much appreciated.

Thanks in advance,


René Shekerjian | Testing Services Division | NYS Department of Civil Service | 518-473-9937

IPAC-List at ipacweb.org

More information about the IPAC-List mailing list