[IPAC-List] Interpretation of internal consistency reliability coefficients

Winfred Arthur, Jr. w-arthur at neo.tamu.edu
Thu Feb 4 16:27:23 EST 2010

i suppose whether it is inappropriate "to use the whole test index as
the measure of reliability" or not depends on _*what*_ you are
interpreting as "the test score" b/c that determines what the
appropriate level of analysis shld be. thus, for example, if i am
interpreting the GRE quant score, then i would be interested in the rxx
of that subtest. on the other hand, if i am interpreting the total GRE
score, the rxx metric of interest would be at the level of the total score.

in addition, i am sure you are already aware of this but it is worth
noting that alpha (of kr-20) is influenced not only by the number of
items but also the avg item correlations. that is why it is possible to
have fairly high alphas for tests that measure clearly multi-dimensional
domains (e.g., quant & verbal) -- if they are correlated. this also
serves as the caution for not interpreting alpha as a metric or
indicator of unidimensionality [which is distinct from homogeneity].


- winfred

On 2/4/2010 7:46 AM, GWEN SCHINDLER wrote:

> IPMAAC Folks,


> I have struggled with the interpretation of internal consistency

> coefficients, in particular KR-20, for several years now. In my

> experience, the reliability index is influenced largely by the number of

> items in the subtest. So when I run reliability coefficients on my

> separate subtests, which are theoretically more homogenous than my whole

> test, my KR-20s are always much lower than the whole test KR-20 and

> often lower than the recommended indices. This has propelled some

> people I have worked with to use the whole test index as the measure of

> reliability, which gets me reeling because I believe it's an incorrect

> use of the statistic. However, when one is thinking about what looks

> good on paper and about defending one's test to a judge and jury of lay

> persons, part of me understands the argument. It's exasperating for one

> who is really trying to be accurate and stay true to good test

> development principles.




> Gwen Schindler, Personnel Analyst

> Comptroller of Maryland

> Office of Personnel Services

> Louis L Goldstein Treasury Building

> 80 Calvert St., Room 209

> Annapolis, MD 21404-0466

> Phone: 410-260-6622

> Fax: 410-974-5249




>>>> "Patrick McCoy"<Patrick.McCoy at psc-cfp.gc.ca> 2/3/2010 1:50 PM>>>


> If I am not mistaken, a test can have high internal consistency, under

> some circumstances, even if it is tapping more than one construct.


> My guess is that for this to take place, there needs to be enough

> items

> related to each construct and the answers need to be sound.


> Pat McCoy

> Ottawa, Canada



>>>> "Dennis Doverspike "<dd1 at uakron.edu> 2010/02/03 12:16 pm>>>


> Abstract. You have here the classic conflict in any type of test of

> multidimensional domains or a test based on a criterion oriented

> strategy.

> However, at the same time you want to calculate a total score. Bottom

> line,

> it is impossible to say that a .3, .6 or .9 would be preferable

> without

> knowing your exact purpose and the likely correlations between those

> domains. A .3 could be very good in some situations. Although then a

> critic

> might argue - why calculate a total score? Why not calculate a score

> for

> each dimension separately?


> More detail


> First, internal consistency reliability is based on one particular

> theory of

> reliability (or several theories of reliability really) coming from a

> particular view of tests, which sees the best tests as measuring a

> unidimensional latent trait.


> The general theory of situational judgment tests (see especially Mike

> McDaniel's work) is not compatible with classic notions of internal

> consistency. In other words, internal consistency is not an

> appropriate

> index for situational judgment tests.


> Having said that, playing devil's advocate of my own position, it

> could

> be

> argued that situational judgment tests still arrive at a single total

> score,

> which suggests that there is a single something being measured.


> On your point 2, you could have high test retest reliability with low

> internal consistency. It is possible, but again would depend on your

> measure

> and your theory. Technically, under some reliability theories, test

> retest

> is not an acceptable type of reliability at all -- since by definition

> there

> is no random sampling of items (we can retain the fiction as long as

> we

> do

> not blatantly violate it).


> Bottom line then - a .3 internal consistency might be very appropriate

> and

> in fact good for a situational judgment test, especially depending

> upon

> its

> theory of construction. Remember, in developing a test based on a pure

> criterion oriented approach, such as might be seen with classical BIBs

> or

> even situational judgment tests, we would want a correlation between

> each

> additional item of 0, since that results in the greatest Multiple R.

> Thus,

> in that case, lower internal consistencies are better and a high

> internal

> consistency would be bad.


> So you have competing ideas here. You are calculating I would guess a

> total

> score, but you are calculating a total score based on adding togethe

> r

> measures of independent constructs. So a .3 might be good, a .6 might

> be

> good, a .9 might be good, depends on what you are trying to achieve

> and

> what

> the correlation really is between those independent constructs.


> One could go into a lot more detail and argument on these concepts,

> but

> that

> is for a different forum.



> Dennis Doverspike, Ph.D., ABPP

> Professor of Psychology

> Director, Center for Organizational Research

> Senior Fellow of the Institute for Life-Span Development and

> Gerontology

> Psychology Department

> University of Akron

> Akron, Ohio 44325-4301

> 330-972-8372 (Office)

> 330-972-5174 (Office Fax)

> ddoverspike at uakron.edu


> The information is intended only for the person or entity to which it

> is

> addressed and may contain confidential, privileged and/or a work

> product for

> the sole use of the intended recipient. No confidentiality or

> privilege is

> waived or lost by any errant transmission. If you receive this message

> in

> error, please destroy all copies of it and notify the sender. If the

> reader

> of this message is not the intended recipient, you are hereby notified

> that

> any dissemination, distribution or copying of this communication is

> strictly

> prohibited. In the case of E-mail or electronic transmission,

> immediately

> delete it and all copies of it from your system and notify the sender.

> E-mail and fax transmission cannot be guaranteed to be secure or

> error-free

> as information could be intercepted, corrupted, lost, destroyed,

> arrive

> late

> or incomplete, or contain viruses.



> -----Original Message-----

> From: ipac-list-bounces at ipacweb.org

> [mailto:ipac-list-bounces at ipacweb.org]

> On Behalf Of Shekerjian, Rene

> Sent: Wednesday, February 03, 2010 11:36 AM

> To: ipac-list at ipacweb.org

> Subject: [IPAC-List] Interpretation of internal consistency

> reliabilitycoefficients


> I have been reading up on internal consistency reliability

> coefficients

> (e.g., KR-20 and Cronbach's Alpha) in order to clarify my thinking

> about it,

> but I am having trouble finding much of practical use beyond two basic

> points:


> (1) .6 is tolerable, and .9 is the gold standard


> (2) you can have high test-retest reliability with low internal

> consistency


> My question is this:


> Suppose you have around 20 items that make up a situational judgment

> subtest. The domain is defined by job analysis and is supposed to

> address a

> competency that entails behaviors such as interacting with customers,

> solving problems, giving advice, assessing situations, and determining

> the

> best action to take, all within a circumscribed context.


> How would you interpret an internal consistency reliability

> coefficient

> of

> .3 for such a test? How about .6? And what about .9?


> My stab at this is that .3 suggests several unwanted possibilities:

> among

> them are (1) the items were "good" but too difficult for the

> candidates

> and

> (2) the items have flaws such as not fully defining the circumstances

> and/or

> constraints that need to be taken into account to arrive at the

> "correct"

> answer.


> Personally, I would expect a well crafted set of such items that are

> given

> to an appropriate candidate group to hold together and have an

> internal

> consistency reliability coefficient around .6


> As far as getting a .9 for this sort of test, I think that that would

> indicate too narrow a focus for a domain that would be expected to

> cover a

> pretty wide territory.


> Because while it makes sense that people who are more able and

> motivated to

> develop expertise in a "broad" competency will tend to be good at much

> of it

> and those who are less able and/or motivated will tend to perform

> poorly in

> much of it, I would expect some randomization of people's strengths

> and

> weaknesses, which would lead some people to perform well in many areas

> but

> still fall down in a few (but with no discernible pattern) and others

> to

> perform poorly in many but still be strong in a few (again with no

> discernible pattern).


> Your thoughts on this would be much appreciated.


> Thanks in advance,


> René


> René Shekerjian | Testing

> Services Division | NYS Department of

> Civil

> Service | 518-473-9937

> ============================================================================

> =





> _______________________________________________________

> IPAC-List

> IPAC-List at ipacweb.org

> http://www.ipacweb.org/mailman/listinfo/ipac-list


> _______________________________________________________

> IPAC-List

> IPAC-List at ipacweb.org

> http://www.ipacweb.org/mailman/listinfo/ipac-list

> _______________________________________________________

> IPAC-List

> IPAC-List at ipacweb.org

> http://www.ipacweb.org/mailman/listinfo/ipac-list


> ------------------------------------------------------------------------------

> This email and any file transmitted with it may be confidential and are intended solely for the use of the individual or entity to whom they are addressed. If you received this email in error, please notify the Comptroller's System Manager by forwarding this message to postmaster at comp.state.md.us

> ==============================================================================

> _______________________________________________________

> IPAC-List

> IPAC-List at ipacweb.org

> http://www.ipacweb.org/mailman/listinfo/ipac-list


More information about the IPAC-List mailing list