[IPAC-List] We Are All Hog Rasslers, So Get Ready to Get Muddy

Pluta, Paul ppluta at hr.lacounty.gov
Thu Feb 4 09:46:34 EST 2010

Previous message: [IPAC-List] We Are All Hog Rasslers, So Get Ready to Get Muddy
Next message: [IPAC-List] Utility Analysis
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Good point Dennis. However, if our practice is to make high-stakes decisions based on responses to a 10-item scale with a reliability of .53, shouldn't we be looking more to psychometric theory to inform our practice? What is an acceptable amount of error variance when we are making decisions about someone's livelihood? At some point, does our practice become malpractice? We also have to worry about the government inspecting our hogs. In such cases, our practice is likely to be highly scrutinized based on expert testimony related to psychometric theory. You are right; we can't help but get a little muddy.

Paul E. Pluta, ABD, SPHR
Human Resources Analyst III
Los Angeles County Department of Human Resources
Workforce Planning, Test Research, & Appeals Division

-----Original Message-----
From: ipac-list-bounces at ipacweb.org [mailto:ipac-list-bounces at ipacweb.org] On Behalf Of Dennis Doverspike
Sent: Wednesday, February 03, 2010 1:57 PM
To: 'Shekerjian, Rene'; ipac-list at ipacweb.org
Subject: [IPAC-List] We Are All Hog Rasslers, So Get Ready to Get Muddy

"... however pleasant it may be to shuffle through the internal statistics
of a compound test in search of a formula which gives the closest estimate
of a test's reliability under conditions of uncorrelated errors, this is for
practical applications like putting on a clean shirt to rassle a
hog."-William W. Rozeboom

"The classical theory of mental tests ... suffers from some imprecision of
statement so that, from time to time, controversies arise that appear to
raise embarrassing questions concerning its foundations."-Melvin R. Novick

I guess the above says it all. Psychometric theory is about large numbers of
items, large numbers of people, and unidimensional latent traits. Public
sector testing is about small numbers of items, small number of people, and
multidimensional constructs. We try to fit our practice to psychometric
theories. I have long lamented that someone does not come up with theories
that fit our practice.

Dennis Doverspike, Ph.D., ABPP
Professor of Psychology
Director, Center for Organizational Research
Senior Fellow of the Institute for Life-Span Development and Gerontology
Psychology Department
University of Akron
Akron, Ohio 44325-4301
330-972-8372 (Office)
330-972-5174 (Office Fax)
ddoverspike at uakron.edu

The information is intended only for the person or entity to which it is
addressed and may contain confidential, privileged and/or a work product for
the sole use of the intended recipient. No confidentiality or privilege is
waived or lost by any errant transmission. If you receive this message in
error, please destroy all copies of it and notify the sender. If the reader
of this message is not the intended recipient, you are hereby notified that
any dissemination, distribution or copying of this communication is strictly
prohibited. In the case of E-mail or electronic transmission, immediately
delete it and all copies of it from your system and notify the sender.
E-mail and fax transmission cannot be guaranteed to be secure or error-free
as information could be intercepted, corrupted, lost, destroyed, arrive late
or incomplete, or contain viruses.

-----Original Message-----
From: Shekerjian, Rene [mailto:Rene.Shekerjian at cs.state.ny.us]
Sent: Wednesday, February 03, 2010 12:37 PM
To: Doverspike,Dennis; ipac-list at ipacweb.org
Subject: RE: [IPAC-List] Interpretation of internal consistency
reliabilitycoefficients

For me, I think the crux is identified in what Dennis said... [and I just
saw Geoff's message which to me echoes this idea]

"Having said that, playing devil's advocate of my own position, it could be
argued that situational judgment tests still arrive at a single total score,
which suggests that there is a single something being measured"

If we are measuring a something, for example situational judgment, is there
not a "something" that some will have more of and others less of?

Training ability is one that I am familiar with. While it is clearly
multidimensional, I expect that good trainers will have more "training
ability" and poor and/or inexperienced trainers will have less. I wouldn't
expect high correlations between training items, but I would expect there to
be enough of a trend that the items would hang together enough to give me a
KR-20 of .5 or .6

You could say that "training ability" is made up of many dimensions -
knowledge of training methods, knowledge of learning theory, ability to plan
lessons, ability to deliver training, effective use of visual aids,
effective use of PowerPoint, etc.

But wouldn't a good trainer be good in many of those areas and wouldn't a
poor trainer be good in very few. Depending on a person's aptitude,
cognitive ability, experience, education, etc. I would expect him or her to
master more of this broad domain or less of it. So while training is
multidimensional, in the real world its components are "presented" together
to people over and over. A person in the world of training will have
numerous opportunities to acquire these "sub-components" of training. maybe
not all at once, but over time. In which case these abilities should hang
together "inside" the person.

Thank you all for your responses so far. This is rich material for thought.

René

René Shekerjian | Testing Services Division | NYS Department of Civil
Service | 518-473-9937
============================================================================
======================

-----Original Message-----
From: Dennis Doverspike [mailto:dd1 at uakron.edu]
Sent: Wednesday, February 03, 2010 12:17 PM
To: Shekerjian, Rene; ipac-list at ipacweb.org
Subject: RE: [IPAC-List] Interpretation of internal consistency
reliabilitycoefficients

Abstract. You have here the classic conflict in any type of test of
multidimensional domains or a test based on a criterion oriented strategy.
However, at the same time you want to calculate a total score. Bottom line,
it is impossible to say that a .3, .6 or .9 would be preferable without
knowing your exact purpose and the likely correlations between those
domains. A .3 could be very good in some situations. Although then a critic
might argue - why calculate a total score? Why not calculate a score for
each dimension separately?

More detail

First, internal consistency reliability is based on one particular theory of
reliability (or several theories of reliability really) coming from a
particular view of tests, which sees the best tests as measuring a
unidimensional latent trait.

The general theory of situational judgment tests (see especially Mike
McDaniel's work) is not compatible with classic notions of internal
consistency. In other words, internal consistency is not an appropriate
index for situational judgment tests.

Having said that, playing devil's advocate of my own position, it could be
argued that situational judgment tests still arrive at a single total score,
which suggests that there is a single something being measured.

On your point 2, you could have high test retest reliability with low
internal consistency. It is possible, but again would depend on your measure
and your theory. Technically, under some reliability theories, test retest
is not an acceptable type of reliability at all -- since by definition there
is no random sampling of items (we can retain the fiction as long as we do
not blatantly violate it).

Bottom line then - a .3 internal consistency might be very appropriate and
in fact good for a situational judgment test, especially depending upon its
theory of construction. Remember, in developing a test based on a pure
criterion oriented approach, such as might be seen with classical BIBs or
even situational judgment tests, we would want a correlation between each
additional item of 0, since that results in the greatest Multiple R. Thus,
in that case, lower internal consistencies are better and a high internal
consistency would be bad.

So you have competing ideas here. You are calculating I would guess a total
score, but you are calculating a total score based on adding together
measures of independent constructs. So a .3 might be good, a .6 might be
good, a .9 might be good, depends on what you are trying to achieve and what
the correlation really is between those independent constructs.

One could go into a lot more detail and argument on these concepts, but that
is for a different forum.

Dennis Doverspike, Ph.D., ABPP
Professor of Psychology
Director, Center for Organizational Research
Senior Fellow of the Institute for Life-Span Development and Gerontology
Psychology Department University of Akron Akron, Ohio 44325-4301
330-972-8372 (Office) 330-972-5174 (Office Fax) ddoverspike at uakron.edu

The information is intended only for the person or entity to which it is
addressed and may contain confidential, privileged and/or a work product for
the sole use of the intended recipient. No confidentiality or privilege is
waived or lost by any errant transmission. If you receive this message in
error, please destroy all copies of it and notify the sender. If the reader
of this message is not the intended recipient, you are hereby notified that
any dissemination, distribution or copying of this communication is strictly
prohibited. In the case of E-mail or electronic transmission, immediately
delete it and all copies of it from your system and notify the sender.
E-mail and fax transmission cannot be guaranteed to be secure or error-free
as information could be intercepted, corrupted, lost, destroyed, arrive late
or incomplete, or contain viruses.

-----Original Message-----
From: ipac-list-bounces at ipacweb.org [mailto:ipac-list-bounces at ipacweb.org]
On Behalf Of Shekerjian, Rene
Sent: Wednesday, February 03, 2010 11:36 AM
To: ipac-list at ipacweb.org
Subject: [IPAC-List] Interpretation of internal consistency
reliabilitycoefficients

I have been reading up on internal consistency reliability coefficients
(e.g., KR-20 and Cronbach's Alpha) in order to clarify my thinking about it,
but I am having trouble finding much of practical use beyond two basic
points:

(1) .6 is tolerable, and .9 is the gold standard

(2) you can have high test-retest reliability with low internal consistency

My question is this:

Suppose you have around 20 items that make up a situational judgment
subtest. The domain is defined by job analysis and is supposed to address a
competency that entails behaviors such as interacting with customers,
solving problems, giving advice, assessing situations, and determining the
best action to take, all within a circumscribed context.

How would you interpret an internal consistency reliability coefficient of
.3 for such a test? How about .6? And what about .9?

My stab at this is that .3 suggests several unwanted possibilities: among
them are (1) the items were "good" but too difficult for the candidates and
(2) the items have flaws such as not fully defining the circumstances and/or
constraints that need to be taken into account to arrive at the "correct"
answer.

Personally, I would expect a well crafted set of such items that are given
to an appropriate candidate group to hold together and have an internal
consistency reliability coefficient around .6

As far as getting a .9 for this sort of test, I think that that would
indicate too narrow a focus for a domain that would be expected to cover a
pretty wide territory.

Because while it makes sense that people who are more able and motivated to
develop expertise in a "broad" competency will tend to be good at much of it
and those who are less able and/or motivated will tend to perform poorly in
much of it, I would expect some randomization of people's strengths and
weaknesses, which would lead some people to perform well in many areas but
still fall down in a few (but with no discernible pattern) and others to
perform poorly in many but still be strong in a few (again with no
discernible pattern).

Your thoughts on this would be much appreciated.

Thanks in advance,

René

René Shekerjian | Testing Services Division | NYS Department of Civil
Service | 518-473-9937
============================================================================
=

_______________________________________________________
IPAC-List
IPAC-List at ipacweb.org http://www.ipacweb.org/mailman/listinfo/ipac-list

_______________________________________________________
IPAC-List
IPAC-List at ipacweb.org
http://www.ipacweb.org/mailman/listinfo/ipac-list

Previous message: [IPAC-List] We Are All Hog Rasslers, So Get Ready to Get Muddy
Next message: [IPAC-List] Utility Analysis
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the IPAC-List mailing list