[IPAC-List] Interpretation of internal consistency reliability coefficients

Geoff Burcaw gburcaw at cps.ca.gov
Wed Feb 3 12:26:32 EST 2010


Something like a .3 would illustrate the problem with creating a competency based on the context in which the behaviors are performed rather than on the similarity of the underlying KSAs. This is common with situational judgment tests. The behaviors you mentioned may all be necessary in a particular "situation" or performance sub-domain, such as providing customer service, for example, but within that competency you have KSAs that may not hang together naturally. It looks like your competency includes customer service orientation and communication, and also critical thinking and decision making. Of course those who score high on some items will tend to score high on others, but if you have good items, I bet you would see a discernable pattern.


-----Original Message-----
From: ipac-list-bounces at ipacweb.org [mailto:ipac-list-bounces at ipacweb.org] On Behalf Of Shekerjian, Rene
Sent: Wednesday, February 03, 2010 8:36 AM
To: ipac-list at ipacweb.org
Subject: [IPAC-List] Interpretation of internal consistency reliability coefficients

I have been reading up on internal consistency reliability coefficients (e.g., KR-20 and Cronbach's Alpha) in order to clarify my thinking about it, but I am having trouble finding much of practical use beyond two basic points:

(1) .6 is tolerable, and .9 is the gold standard

(2) you can have high test-retest reliability with low internal consistency

My question is this:

Suppose you have around 20 items that make up a situational judgment subtest. The domain is defined by job analysis and is supposed to address a competency that entails behaviors such as interacting with customers, solving problems, giving advice, assessing situations, and determining the best action to take, all within a circumscribed context.

How would you interpret an internal consistency reliability coefficient of .3 for such a test? How about .6? And what about .9?

My stab at this is that .3 suggests several unwanted possibilities: among them are (1) the items were "good" but too difficult for the candidates and (2) the items have flaws such as not fully defining the circumstances and/or constraints that need to be taken into account to arrive at the "correct" answer.

Personally, I would expect a well crafted set of such items that are given to an appropriate candidate group to hold together and have an internal consistency reliability coefficient around .6

As far as getting a .9 for this sort of test, I think that that would indicate too narrow a focus for a domain that would be expected to cover a pretty wide territory.

Because while it makes sense that people who are more able and motivated to develop expertise in a "broad" competency will tend to be good at much of it and those who are less able and/or motivated will tend to perform poorly in much of it, I would expect some randomization of people's strengths and weaknesses, which would lead some people to perform well in many areas but still fall down in a few (but with no discernible pattern) and others to perform poorly in many but still be strong in a few (again with no discernible pattern).

Your thoughts on this would be much appreciated.

Thanks in advance,


René Shekerjian | Testing Services Division | NYS Department of Civil Service | 518-473-9937

IPAC-List at ipacweb.org

More information about the IPAC-List mailing list