[IPAC-List] Interpretation of internal consistency reliabilitycoefficients
LMueller at air.org
Wed Feb 3 12:52:26 EST 2010
I agree that coefficient alpha is best suited to homogeneous scales, but disagree that it doesn't tell us something outside of those circumstances.
There is a big risk in using such a test in selection contexts. What the low alpha tells us is that changing a few items in the test could substantially change the inferences made on the basis of the total scores (i.e., pass/fail decisions or candidate ordering in top-down selection). If you can establish test/retest reliability or parallel forms reliability, (in addition to demonstrating representative content) you may be on solid ground. Otherwise, I have suggested to René that she investigate a stratified alpha approach to attempt to identify meaningful subscales.
Lorin Mueller, PhD, SPHR
Principal Research Scientist
American Institutes for Research
1000 Thomas Jefferson St., NW
Washington, DC, 20007
From: ipac-list-bounces at ipacweb.org [mailto:ipac-list-bounces at ipacweb.org] On Behalf Of Pluta, Paul
Sent: Wednesday, February 03, 2010 12:27 PM
To: Shekerjian, Rene; ipac-list at ipacweb.org
Subject: Re: [IPAC-List] Interpretation of internal consistency reliabilitycoefficients
Internal consistency only tells you something when you are dealing with a homogeneous scale. I only look at it at the subscale level when reviewing tests measuring multiple constructs. I would not be surprised to find internal consistency of .3 overall for a test that taps multiple constructs because, by its very nature, such a test can be expected to have low internal consistency. If such a test had high internal consistency, I would question the dimensionality of the test. If you believe you are dealing with a homogeneous construct, but have low internal consistency, you may not be dealing with a homogeneous construct after all. SJTs are multi-dimensional assessments and internal consistency is not an appropriate indicator of the test's reliability.
Paul E. Pluta, MA, SPHR
Human Resources Analyst III
Los Angeles County Department of Human Resources
Workforce Planning, Test Research, & Appeals Division
From: ipac-list-bounces at ipacweb.org [mailto:ipac-list-bounces at ipacweb.org] On Behalf Of Shekerjian, Rene
Sent: Wednesday, February 03, 2010 8:36 AM
To: ipac-list at ipacweb.org
Subject: [IPAC-List] Interpretation of internal consistency reliabilitycoefficients
I have been reading up on internal consistency reliability coefficients (e.g., KR-20 and Cronbach's Alpha) in order to clarify my thinking about it, but I am having trouble finding much of practical use beyond two basic points:
(1) .6 is tolerable, and .9 is the gold standard
(2) you can have high test-retest reliability with low internal consistency
My question is this:
Suppose you have around 20 items that make up a situational judgment subtest. The domain is defined by job analysis and is supposed to address a competency that entails behaviors such as interacting with customers, solving problems, giving advice, assessing situations, and determining the best action to take, all within a circumscribed context.
How would you interpret an internal consistency reliability coefficient of .3 for such a test? How about .6? And what about .9?
My stab at this is that .3 suggests several unwanted possibilities: among them are (1) the items were "good" but too difficult for the candidates and (2) the items have flaws such as not fully defining the circumstances and/or constraints that need to be taken into account to arrive at the "correct" answer.
Personally, I would expect a well crafted set of such items that are given to an appropriate candidate group to hold together and have an internal consistency reliability coefficient around .6
As far as getting a .9 for this sort of test, I think that that would indicate too narrow a focus for a domain that would be expected to cover a pretty wide territory.
Because while it makes sense that people who are more able and motivated to develop expertise in a "broad" competency will tend to be good at much of it and those who are less able and/or motivated will tend to perform poorly in much of it, I would expect some randomization of people's strengths and weaknesses, which would lead some people to perform well in many areas but still fall down in a few (but with no discernible pattern) and others to perform poorly in many but still be strong in a few (again with no discernible pattern).
Your thoughts on this would be much appreciated.
Thanks in advance,
René Shekerjian | Testing Services Division | NYS Department of Civil Service | 518-473-9937
IPAC-List at ipacweb.org
IPAC-List at ipacweb.org
More information about the IPAC-List