getting the contrast for each category vs. the other categories

jeremyfreese

Join Date: Sep 2014

Posts: 13
#1

getting the contrast for each category vs. the other categories

11 Sep 2018, 11:07

For a nominal explanatory variable, tables typically present the difference between k-1 categories and a reference category, and significance tests for that. The -contrast- command makes it easy to instead get the contrast between each of the categories and the grand mean (unbalanced/asobserved), and the significance test associated with that. In many contexts, though, I think the contrast that folks would find most informative would actually be the contrast between a category and the mean of the other categories (again, unbalanced/asobserved).

As far as I can figure out, -contrast- allows you to compute this for either the first or last category for an ordinal variable, but is there a way to get this for all categories?
Tags: None
Richard Williams

Join Date: Apr 2014

Posts: 5043
#2

11 Sep 2018, 14:28

I may be missing something, but I don't think it matters whether you are contrasting a category with the grand mean or contrasting it with the mean of the other categories. Consider

Code:

webuse nhanes2f, clear gen xhealth = health logit diabetes i.xhealth margins g.xhealth recode xhealth ( 3 = 0) logit diabetes i.xhealth margins h.xhealth

After the first margins I get

Code:

. margins g.xhealth Contrasts of adjusted predictions Model VCE : OIM Expression : Pr(diabetes), predict() ------------------------------------------------ | df chi2 P>chi2 -------------+---------------------------------- xhealth | (1 vs mean) | 1 96.23 0.0000 (2 vs mean) | 1 15.83 0.0001 (3 vs mean) | 1 32.64 0.0000 (4 vs mean) | 1 187.16 0.0000 (5 vs mean) | 1 268.49 0.0000 Joint | 4 311.20 0.0000 ------------------------------------------------

So, category 3 for the first category vs the grand mean yield Chi2 = 32.64 with 1 df.

After that, I recode health so category 3 becomes the first category. By then using the h. operator on margins, I get the original category 3 contrasted with the mean of all the other categories. This yields

Code:

. margins h.xhealth Contrasts of adjusted predictions Model VCE : OIM Expression : Pr(diabetes), predict() ------------------------------------------------ | df chi2 P>chi2 -------------+---------------------------------- xhealth | (0 vs >0) | 1 32.64 0.0000 (1 vs >1) | 1 96.54 0.0000 (2 vs >2) | 1 125.99 0.0000 (4 vs 5) | 1 5.12 0.0237 Joint | 4 311.20 0.0000 ------------------------------------------------

Again I get Chi2 = 32.64 with 1 df.

So in short, I don't think it matters whether the contrast is between a category with the grand mean or a category versus the mean of the other categories.

It may be that I misunderstand the question or that my one example doesn't cover all cases. If so, what you want might be a nice addition to Long and Freese's spost13 commands (hint, hint).

I did find it puzzling at first that there wasn't an option to do what you wanted. If I am right it might be nice to include a short note in the documentation as to why such an option is not needed.

-------------------------------------------
Richard Williams
Professor Emeritus of Sociology
University of Notre Dame
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://academicweb.nd.edu/~rwilliam/
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5043
#3

11 Sep 2018, 14:36

I can't find it now, but I think there was a thread a while back showing that comparing a subgroup mean with the overall mean yielded the same Statistical test as comparing a subgroup mean with the mean of all the groups. Either that, or my memory is consistently wrong. If I am right, I find these results reassuring because, if you want to know whether one group is different from the rest, it seems like it shouldn't depend on whether the group itself was used to compute an overall mean or wasn't used.

Steve Samuels , my vague memory is that you may have been involved in that thread, but maybe not. It may have included equations which proved the point.

-------------------------------------------
Richard Williams
Professor Emeritus of Sociology
University of Notre Dame
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://academicweb.nd.edu/~rwilliam/
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5043
#4

11 Sep 2018, 15:03

I think this is the thread I remembered:

https://www.statalist.org/forums/for...tandard-errors

Steve Samuels was talking about a special case of subsample vs population comparisons but I bet it can be generalized. (And if I am totally wrong, hopefully someone will step in and stop me before I can do more harm.)

-------------------------------------------
Richard Williams
Professor Emeritus of Sociology
University of Notre Dame
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://academicweb.nd.edu/~rwilliam/
1 like
Comment
jeremyfreese

Join Date: Sep 2014

Posts: 13
#5

11 Sep 2018, 18:06

Oh, yeah, I should have been clearer that what I am looking for here is partly the test (which you are right and good point), but more importantly the effects themselves (like what you get what you add the -effects- option to the -contrast- command.)
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5043
#6

11 Sep 2018, 20:10

Maybe you could give a simple replicable example of what is currently possible and then describe what you would rather have instead. If the test results are the same I'm not sure why/how the effects would differ but I'm not clearly visualizing this in my head right now.

-------------------------------------------
Richard Williams
Professor Emeritus of Sociology
University of Notre Dame
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://academicweb.nd.edu/~rwilliam/
Comment

Joseph Coveney

Join Date: Apr 2014
Posts: 4541

11 Sep 2018, 23:23

I believe that you have to use a set of custom contrasts to get what you want. Something like the following.

Code:

version 15.1

clear *

set seed `=strreverse("1461598")'

quietly sysuse auto
quietly replace rep78 = runiformint(1, 4) if mi(rep78)

*
* Begin here
*
quietly regress price i.rep78

// Get the first contrast "(1 vs !1)"
tabulate rep78
contrast {rep78 `=2/2' `=-9/72' `=-32/72' `=-20/72' `=-11/72'}

// (As-balanced would be: contrast {rep78 1 -0.25 -0.25 -0.25 -0.25}

// Get the second contrast "(2 vs !2)"
contrast {rep78 `=-2/65' `=9/9' `=-32/65' `=-20/65' `=-11/65'}

// And so on
exit

You could automate it, using the -matcell()- option to -tabulate- and cycling through the vector of counts to set up the contrasts.

Comment

Richard Williams

Join Date: Apr 2014

Posts: 5043
#8

11 Sep 2018, 23:41

I was thinking about something like Joseph suggests, i.e. some sort of brute force approach where you kept on running commands over and over. But if you are going to do something like that, wouldn't it be easier just to keep on recoding rep78, letting each category take a turn as #1?

-------------------------------------------
Richard Williams
Professor Emeritus of Sociology
University of Notre Dame
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://academicweb.nd.edu/~rwilliam/
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4541
#9

12 Sep 2018, 00:19

Pretty funny: I was originally thinking of some sort of brute-force -recode- cycling through rep78 and using -h.rep78 just as you suggest, but then got worried that I would get lost as to who's on first and what's on second with all of the recoding going on. My gut feeling was that, although it's more tedious to do the arithmetic and typing, it's easier to see where I was in the cycle with it laid out on a line.

In Jeremy's circle these kinds of contrast seem to be fairly common, but I don't recall running into a situation where I wanted to do them. I guess that in my circumstances, categories are either ordered, or else it's nominal category A versus control and nominal category B versus A, and things tend to be as-balanced.
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5043
#10

12 Sep 2018, 05:21

daniel klein has this neat utility called labrecode. It recodes variables and changes the value labels accordingly. It seems potentially dangerous if you don't use it right! But it seems like if you keep on recoding a variable but use the correct value labels, you could keep track of who's on first and what's on second.

I'm still not convinced about the need though. But if it is useful, perhaps Stata could build it in as a contrast option.

-------------------------------------------
Richard Williams
Professor Emeritus of Sociology
University of Notre Dame
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://academicweb.nd.edu/~rwilliam/
Comment

jeremyfreese

Join Date: Sep 2014
Posts: 13

#11

12 Sep 2018, 10:32

Yeah, I think y'all get what I'm asking, but just to answer Rich's question asking for a case example:

Code:

. quietly regress realrinc i.race_eth

. contrast gw.race_eth, effects nowald

Contrasts of marginal linear predictions

Margins      : asbalanced

-----------------------------------------------------------------------------------
                  |   Contrast   Std. Err.      t    P>|t|     [95% Conf. Interval]
------------------+----------------------------------------------------------------
         race_eth |
 (Asian vs mean)  |   12258.27   1830.106     6.70   0.000     8671.012    15845.53
 (Black vs mean)  |   -7746.15    810.122    -9.56   0.000      -9334.1   -6158.201
  (Hisp vs mean)  |  -8642.205   861.9319   -10.03   0.000    -10331.71   -6952.701
(Nat Am vs mean)  |  -9780.413    3504.39    -2.79   0.005    -16649.49   -2911.331
 (White vs mean)  |   2738.971   219.4558    12.48   0.000     2308.808    3169.134
-----------------------------------------------------------------------------------

And so what I want is the contrast and std error for Asian vs Not Asian, Black vs Not Black, etc., instead of each category vs. the mean. My end goal is to report these in a table instead of defining a reference category. But that's why I was hoping for a way around brute-forcing it, because if there was a way of doing it with a single call to contrast it would be way easier to move into a table. But Joseph's approach aligns with my intuition that any way of getting contrast to do it would involve multiple calls to contrast (his way is better than what I thought of, which was his who's-on-first idea.)

Thanks!

Comment

Dimitriy V. Masterov

Join Date: Mar 2014

Posts: 609
#12

11 Aug 2020, 21:25

I answered a question on CV about global versus leave-self-out contrasts, and another poster linked to this thread. I think my answer clarifies the reasoning for why they would be the same. Please take a look if you are still interested.
Comment

Announcement