Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • getting the contrast for each category vs. the other categories

    For a nominal explanatory variable, tables typically present the difference between k-1 categories and a reference category, and significance tests for that. The -contrast- command makes it easy to instead get the contrast between each of the categories and the grand mean (unbalanced/asobserved), and the significance test associated with that. In many contexts, though, I think the contrast that folks would find most informative would actually be the contrast between a category and the mean of the other categories (again, unbalanced/asobserved).

    As far as I can figure out, -contrast- allows you to compute this for either the first or last category for an ordinal variable, but is there a way to get this for all categories?

  • #2
    I may be missing something, but I don't think it matters whether you are contrasting a category with the grand mean or contrasting it with the mean of the other categories. Consider

    Code:
    webuse nhanes2f, clear
    gen xhealth = health
    logit diabetes i.xhealth
    margins g.xhealth
    recode xhealth ( 3 = 0)
    logit diabetes i.xhealth
    margins h.xhealth
    After the first margins I get

    Code:
    . margins g.xhealth
    
    Contrasts of adjusted predictions
    Model VCE    : OIM
    
    Expression   : Pr(diabetes), predict()
    
    ------------------------------------------------
                 |         df        chi2     P>chi2
    -------------+----------------------------------
         xhealth |
    (1 vs mean)  |          1       96.23     0.0000
    (2 vs mean)  |          1       15.83     0.0001
    (3 vs mean)  |          1       32.64     0.0000
    (4 vs mean)  |          1      187.16     0.0000
    (5 vs mean)  |          1      268.49     0.0000
          Joint  |          4      311.20     0.0000
    ------------------------------------------------
    So, category 3 for the first category vs the grand mean yield Chi2 = 32.64 with 1 df.

    After that, I recode health so category 3 becomes the first category. By then using the h. operator on margins, I get the original category 3 contrasted with the mean of all the other categories. This yields

    Code:
    . margins h.xhealth
    
    Contrasts of adjusted predictions
    Model VCE    : OIM
    
    Expression   : Pr(diabetes), predict()
    
    ------------------------------------------------
                 |         df        chi2     P>chi2
    -------------+----------------------------------
         xhealth |
      (0 vs >0)  |          1       32.64     0.0000
      (1 vs >1)  |          1       96.54     0.0000
      (2 vs >2)  |          1      125.99     0.0000
      (4 vs  5)  |          1        5.12     0.0237
          Joint  |          4      311.20     0.0000
    ------------------------------------------------
    Again I get Chi2 = 32.64 with 1 df.

    So in short, I don't think it matters whether the contrast is between a category with the grand mean or a category versus the mean of the other categories.

    It may be that I misunderstand the question or that my one example doesn't cover all cases. If so, what you want might be a nice addition to Long and Freese's spost13 commands (hint, hint).

    I did find it puzzling at first that there wasn't an option to do what you wanted. If I am right it might be nice to include a short note in the documentation as to why such an option is not needed.
    -------------------------------------------
    Richard Williams, Notre Dame Dept of Sociology
    Stata Version: 17.0 MP (2 processor)

    EMAIL: rwilliam@ND.Edu
    WWW: https://www3.nd.edu/~rwilliam

    Comment


    • #3
      I can't find it now, but I think there was a thread a while back showing that comparing a subgroup mean with the overall mean yielded the same Statistical test as comparing a subgroup mean with the mean of all the groups. Either that, or my memory is consistently wrong. If I am right, I find these results reassuring because, if you want to know whether one group is different from the rest, it seems like it shouldn't depend on whether the group itself was used to compute an overall mean or wasn't used.

      Steve Samuels , my vague memory is that you may have been involved in that thread, but maybe not. It may have included equations which proved the point.
      -------------------------------------------
      Richard Williams, Notre Dame Dept of Sociology
      Stata Version: 17.0 MP (2 processor)

      EMAIL: rwilliam@ND.Edu
      WWW: https://www3.nd.edu/~rwilliam

      Comment


      • #4
        I think this is the thread I remembered:

        https://www.statalist.org/forums/for...tandard-errors

        Steve Samuels was talking about a special case of subsample vs population comparisons but I bet it can be generalized. (And if I am totally wrong, hopefully someone will step in and stop me before I can do more harm.)
        -------------------------------------------
        Richard Williams, Notre Dame Dept of Sociology
        Stata Version: 17.0 MP (2 processor)

        EMAIL: rwilliam@ND.Edu
        WWW: https://www3.nd.edu/~rwilliam

        Comment


        • #5
          Oh, yeah, I should have been clearer that what I am looking for here is partly the test (which you are right and good point), but more importantly the effects themselves (like what you get what you add the -effects- option to the -contrast- command.)

          Comment


          • #6
            Maybe you could give a simple replicable example of what is currently possible and then describe what you would rather have instead. If the test results are the same I'm not sure why/how the effects would differ but I'm not clearly visualizing this in my head right now.
            -------------------------------------------
            Richard Williams, Notre Dame Dept of Sociology
            Stata Version: 17.0 MP (2 processor)

            EMAIL: rwilliam@ND.Edu
            WWW: https://www3.nd.edu/~rwilliam

            Comment


            • #7
              I believe that you have to use a set of custom contrasts to get what you want. Something like the following.
              Code:
              version 15.1
              
              clear *
              
              set seed `=strreverse("1461598")'
              
              quietly sysuse auto
              quietly replace rep78 = runiformint(1, 4) if mi(rep78)
              
              *
              * Begin here
              *
              quietly regress price i.rep78
              
              // Get the first contrast "(1 vs !1)"
              tabulate rep78
              contrast {rep78 `=2/2' `=-9/72' `=-32/72' `=-20/72' `=-11/72'}
              
              // (As-balanced would be: contrast {rep78 1 -0.25 -0.25 -0.25 -0.25}
              
              // Get the second contrast "(2 vs !2)"
              contrast {rep78 `=-2/65' `=9/9' `=-32/65' `=-20/65' `=-11/65'}
              
              // And so on
              exit
              You could automate it, using the -matcell()- option to -tabulate- and cycling through the vector of counts to set up the contrasts.

              Comment


              • #8
                I was thinking about something like Joseph suggests, i.e. some sort of brute force approach where you kept on running commands over and over. But if you are going to do something like that, wouldn't it be easier just to keep on recoding rep78, letting each category take a turn as #1?
                -------------------------------------------
                Richard Williams, Notre Dame Dept of Sociology
                Stata Version: 17.0 MP (2 processor)

                EMAIL: rwilliam@ND.Edu
                WWW: https://www3.nd.edu/~rwilliam

                Comment


                • #9
                  Pretty funny: I was originally thinking of some sort of brute-force -recode- cycling through rep78 and using -h.rep78 just as you suggest, but then got worried that I would get lost as to who's on first and what's on second with all of the recoding going on. My gut feeling was that, although it's more tedious to do the arithmetic and typing, it's easier to see where I was in the cycle with it laid out on a line.

                  In Jeremy's circle these kinds of contrast seem to be fairly common, but I don't recall running into a situation where I wanted to do them. I guess that in my circumstances, categories are either ordered, or else it's nominal category A versus control and nominal category B versus A, and things tend to be as-balanced.

                  Comment


                  • #10
                    daniel klein has this neat utility called labrecode. It recodes variables and changes the value labels accordingly. It seems potentially dangerous if you don't use it right! But it seems like if you keep on recoding a variable but use the correct value labels, you could keep track of who's on first and what's on second.

                    I'm still not convinced about the need though. But if it is useful, perhaps Stata could build it in as a contrast option.
                    -------------------------------------------
                    Richard Williams, Notre Dame Dept of Sociology
                    Stata Version: 17.0 MP (2 processor)

                    EMAIL: rwilliam@ND.Edu
                    WWW: https://www3.nd.edu/~rwilliam

                    Comment


                    • #11

                      Yeah, I think y'all get what I'm asking, but just to answer Rich's question asking for a case example:
                      Code:
                      . quietly regress realrinc i.race_eth
                      
                      . contrast gw.race_eth, effects nowald
                      
                      Contrasts of marginal linear predictions
                      
                      Margins      : asbalanced
                      
                      -----------------------------------------------------------------------------------
                                        |   Contrast   Std. Err.      t    P>|t|     [95% Conf. Interval]
                      ------------------+----------------------------------------------------------------
                               race_eth |
                       (Asian vs mean)  |   12258.27   1830.106     6.70   0.000     8671.012    15845.53
                       (Black vs mean)  |   -7746.15    810.122    -9.56   0.000      -9334.1   -6158.201
                        (Hisp vs mean)  |  -8642.205   861.9319   -10.03   0.000    -10331.71   -6952.701
                      (Nat Am vs mean)  |  -9780.413    3504.39    -2.79   0.005    -16649.49   -2911.331
                       (White vs mean)  |   2738.971   219.4558    12.48   0.000     2308.808    3169.134
                      -----------------------------------------------------------------------------------
                      And so what I want is the contrast and std error for Asian vs Not Asian, Black vs Not Black, etc., instead of each category vs. the mean. My end goal is to report these in a table instead of defining a reference category. But that's why I was hoping for a way around brute-forcing it, because if there was a way of doing it with a single call to contrast it would be way easier to move into a table. But Joseph's approach aligns with my intuition that any way of getting contrast to do it would involve multiple calls to contrast (his way is better than what I thought of, which was his who's-on-first idea.)

                      Thanks!

                      Comment


                      • #12
                        I answered a question on CV about global versus leave-self-out contrasts, and another poster linked to this thread. I think my answer clarifies the reasoning for why they would be the same. Please take a look if you are still interested.

                        Comment

                        Working...
                        X