[IPAC-List] Interpretation of internal consistency reliability coefficients

Shekerjian, Rene Rene.Shekerjian at cs.state.ny.us
Wed Feb 3 11:36:13 EST 2010

I have been reading up on internal consistency reliability coefficients (e.g., KR-20 and Cronbach's Alpha) in order to clarify my thinking about it, but I am having trouble finding much of practical use beyond two basic points:

(1) .6 is tolerable, and .9 is the gold standard

(2) you can have high test-retest reliability with low internal consistency

My question is this:

Suppose you have around 20 items that make up a situational judgment subtest. The domain is defined by job analysis and is supposed to address a competency that entails behaviors such as interacting with customers, solving problems, giving advice, assessing situations, and determining the best action to take, all within a circumscribed context.

How would you interpret an internal consistency reliability coefficient of .3 for such a test? How about .6? And what about .9?

My stab at this is that .3 suggests several unwanted possibilities: among them are (1) the items were "good" but too difficult for the candidates and (2) the items have flaws such as not fully defining the circumstances and/or constraints that need to be taken into account to arrive at the "correct" answer.

Personally, I would expect a well crafted set of such items that are given to an appropriate candidate group to hold together and have an internal consistency reliability coefficient around .6

As far as getting a .9 for this sort of test, I think that that would indicate too narrow a focus for a domain that would be expected to cover a pretty wide territory.

Because while it makes sense that people who are more able and motivated to develop expertise in a "broad" competency will tend to be good at much of it and those who are less able and/or motivated will tend to perform poorly in much of it, I would expect some randomization of people's strengths and weaknesses, which would lead some people to perform well in many areas but still fall down in a few (but with no discernible pattern) and others to perform poorly in many but still be strong in a few (again with no discernible pattern).

Your thoughts on this would be much appreciated.

René Shekerjian | Testing Services Division | NYS Department of Civil Service | 518-473-9937

