Testing whether difference between interacted coefficients is not larger than threshold

Nicole Kapelle

Join Date: Apr 2016

Posts: 31
#1

Testing whether difference between interacted coefficients is not larger than threshold

02 Apr 2019, 09:55

Dear all,

this is probably quite a simple issue for people that are familiar with the test command

I am running a fixed effects regression to examine the association between changes in wealth (ihs transformed) over the marital dissolution process. The marital dissolution process is reflected by four dummy variables (sepneg, sepy, divy1 and divy2) with the reference 'at least 3 years prior to actual separation'. As I expect this to differ between men and women, I interact the marital dissolution dummies with a gender dummy (female 1/0). I run this in Stata 15.

My code for the fixed effects regression looks as followed:

Code:

xtreg wealth_ihsa i.sepneg#i.female i.sepy#i.female i.divy1#i.female i.divy2#i.female $covfe, fe vce(cluster id)

I understand that I can test whether men and women during separation differ significantly in their wealth with the following command:

Code:

test 1.sepy#0.female=1.sepy#1.female

What I would like to test though is whether the difference between men and women during separation (1.sepy#0.female vs 1.sepy#1.female) is not larger than 0.096. This is equivalent to a 10% difference. After an advice from a colleague, I have tried the following:

Code:

test _b[1.sep#0.female:1.sep#1.female]-0.096=0

However, I only get the error message 'equation 1.sep#0.female not found'. So obviously, this is not the right way. Can anyone advice on how to properly test this? Is the command 'test' even the appropriate command to use?

I hope I have been clear enough and someone can provide some help.
Thanks,
Nicole
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30174
#2

02 Apr 2019, 14:08

Before answering your question, let me point out that your regression model is mis-specified. You have a bunch of interaction terms, but you do not also include the constituents of those terms. While some of them will disappear anyway due to colinearity with the fixed effects, the rest of them are necessary. I'll also point out that you can save yourself a lot of typing by taking advantage of the fact that factor-variable notation is a lot like algebra, and by making use of the ## (not #) operator.

Code:

xtreg wealth_ihsa i.(sepneg sepy divy1 divy2)##i.female $covfe, fe vce(cluster id)

As for the test you want to do, that notation with the colon is not intended for what you are trying to do. Also, you don't have to write the _b[] here. You can just do:

Code:

test 1.sep#0.female - 1.sep#1.female = 0.96
Comment
Nicole Kapelle

Join Date: Apr 2016

Posts: 31
#3

03 Apr 2019, 02:00

Thank you very much Clyde! That was exactly what I was looking for!

Regarding the mis-specification. I was told that in a fixed-effects model it can be helpful to model interactions as "nested effects" (# instead of ##). This way one gets the effect for each subgroup straight way. I had also checked what happens if I specify the interactions using ##. It yields the same results just that I need to add/subtract the coefficient for women from those of men to get the effect for each subgroup. Do you think there is any problem in doing it the way I do it at the moment?

Also, thank you for the shortcut option. This looks much neater than the way I did it!

Thanks,
Nicole
Comment
Nicole Kapelle

Join Date: Apr 2016

Posts: 31
#4

03 Apr 2019, 02:10

I just checked if results for

Code:

test 1.sep#0.female - 1.sep#1.female = 0.96

would be the same if I specify my regression correctly (##) compared to mis-specifying it (#). Results are different. Does that mean that I should specify the interactions rather like you suggested (##)?

Thanks,
Nicole
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30174
#5

03 Apr 2019, 10:17

Regarding the mis-specification. I was told that in a fixed-effects model it can be helpful to model interactions as "nested effects" (# instead of ##). This way one gets the effect for each subgroup straight way.

Yes, but you implemented it incorrectly. You can omit the female indicator and get the separate sex-specific effects that way without adding. But you still need to specify the sepneg, sepy divy1 and divy2 effects separately. Without those, some of the results will come out wrong because (at least) one combination gets absorbed into the constant term.

Personally, I think it is simplest and safest to always use ##, and then get the combination-specific results using the -margins- command. It's foolproof.

Code:

xtreg wealth_ihsa i.(sepneg sepy divy1 divy2)##i.female $covfe, fe vce(cluster id) margins sepneg#female

The margins output will show you the expected values of wealth_ihsa in all combinations of female and sepneg. You don't have to add anything, and you don't even have to think about what, if anything, to add to what. -margins- handles it all for you. If you then want to test whether the difference between males and females who are 1.sepneg is at least 0.96, you can do this:

Code:

margins sepneg#female, post test 1.sepneg#1.female - 1.sepneg#0.female = 0.96

The -post- option causes the -margins- output to replace the regression results in e(), so that you can then use -test- and -lincom-, etc. Actually, I would not use -test- for this, because the only output you will get is a test statistic and p-value. I would actually do this as:

Code:

lincom 1.sepneg#1.female - 1.sepneg#0.female - 0.96

The results of this will give you an estimate of how close to or far from zero the male female difference among the 1.sepneg is, along with a confidence interval. That's a lot more informative than just a naked hypothesis test.

Results are different.

Yes, results are different, because those coefficient names refer to different parameter models in the two approaches.

In the regression coded with #, 1.sep#1.female refers to being sep and female. In the model coded with ##, that same coefficient refers to the difference between being sep and female and being sep and male.
Comment

Nicole Kapelle

Join Date: Apr 2016
Posts: 31

05 Apr 2019, 03:55

Thanks Clyde. I wasn't aware of the lincom command. It certainly sounds more informative then the test command!

I followed your instructions and used lincom. However, I was wondering as 1.sep#1.female already refers to the difference between men and women, can't I just test whether this is different from my threshold of 0.096?

Code:

lincom 1.sepneg#1.female - 0.096

instead of

Code:

lincom 1.sepneg#1.female - 1.sepneg#0.female - 0.096

It yields the same results, so I assume that this is another possible way:

Code:

. lincom 1.sepneg#1.female - 1.sepneg#0.female - 0.096

 ( 1)  - 1o.sepneg#0b.female + 1.sepneg#1.female = .096

------------------------------------------------------------------------------
 wealth_ihsa |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         (1) |    .023074   .8112195     0.03   0.977    -1.567002     1.61315
------------------------------------------------------------------------------

. lincom 1.sepneg#1.female - 0.096  // 1-2 years before separation

 ( 1)  1.sepneg#1.female = .096

------------------------------------------------------------------------------
 wealth_ihsa |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         (1) |    .023074   .8112195     0.03   0.977    -1.567002     1.61315
------------------------------------------------------------------------------

Further, I would like to test whether there are substantial differences between my marital dissolution process stages (separately for men and women). As the coefficients for sepneg, sepy2, divy1 and divy2 provide the estimates for men, I guess I can just test them against each other as follows:

Code:

test 1.sepneg = 1.sepy2

 ( 1)  1.sepneg - 1.sepy2 = 0

       F(  1, 16756) =    3.65
            Prob > F =    0.0561

For women, I am however not 100% if I have really understood it properly. As the effect for women is defined through men's effect plus the difference between men and women (e.g. 1.sepneg + 1.sepneg#1.female), I think I would have to consider this. Is that right? So should it be:

Code:

test 1.sepneg+1.sepneg#1.female = 1.sepy2+1.sepy2#1.female

 ( 1)  1.sepneg - 1.sepy2 + 1.sepneg#1.female - 1.sepy2#1.female = 0

       F(  1, 16756) =   10.45
            Prob > F =    0.0012

if I would like to see if women's wealth levels 1 to 2 years prior to separation differ from their wealth levels during separation?

Here the results for the regression, which might be helpful (also to others) to understand what I am doing:

Code:

. xtreg wealth_ihsa i.(sepneg sepy2 divy1 divy2)##i.female $covfe, fe vce(cluster id)
note: 1.female omitted because of collinearity

Fixed-effects (within) regression               Number of obs     =     26,666
Group variable: id                              Number of groups  =     16,757

R-sq:                                           Obs per group:
     within  = 0.0103                                         min =          1
     between = 0.0980                                         avg =        1.6
     overall = 0.0797                                         max =          3

                                                F(13,16756)       =       5.60
corr(u_i, Xb)  = 0.1958                         Prob > F          =     0.0000

                                 (Std. Err. adjusted for 16,757 clusters in id)
-------------------------------------------------------------------------------
              |               Robust
  wealth_ihsa |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
       sepneg |
         Yes  |  -.3623504   .6288233    -0.58   0.564     -1.59491    .8702095
      1.sepy2 |  -1.879689   .7939382    -2.37   0.018    -3.435892   -.3234864
      1.divy1 |   -2.23874   .8862293    -2.53   0.012    -3.975843    -.501637
      1.divy2 |  -2.937163   .9479033    -3.10   0.002    -4.795153   -1.079172
              |
       female |
         Yes  |          0  (omitted)
              |
sepneg#female |
     Yes#Yes  |    .119074   .8112195     0.15   0.883    -1.471002     1.70915
              |
 sepy2#female |
       1#Yes  |  -.5316259   .9718503    -0.55   0.584    -2.436555    1.373303
              |
 divy1#female |
       1#Yes  |  -.3001206   1.231384    -0.24   0.807    -2.713764    2.113522
              |
 divy2#female |
       1#Yes  |   .8826055   1.135855     0.78   0.437     -1.34379    3.109001
              |
          age |   .3646817   .0893879     4.08   0.000     .1894719    .5398915
              |
  c.age#c.age |  -.0036599   .0007713    -4.74   0.000    -.0051719    -.002148
              |
       c_marr |   .0073378   .0414308     0.18   0.859     -.073871    .0885466
              |
inheritance~r |
         Yes  |   .3614534   .1693846     2.13   0.033     .0294417    .6934651
              |
         year |
        2002  |  -.0211769   .1414659    -0.15   0.881     -.298465    .2561113
              |
        _cons |   .0631306     2.6971     0.02   0.981     -5.22347    5.349731
--------------+----------------------------------------------------------------
      sigma_u |    6.17518
      sigma_e |  4.5014293
          rho |  .65300766   (fraction of variance due to u_i)
-------------------------------------------------------------------------------

Regarding the margins command, Stata does not seem to be able to store the regression results appropriately (or I am doing something wrong). If I run your suggested margins command after the regression I receive the following:

Code:

. margins sepneg#female, post

Predictive margins                              Number of obs     =     26,702
Model VCE    : Robust

Expression   : Linear prediction, predict()

-------------------------------------------------------------------------------
              |            Delta-method
              |     Margin   Std. Err.      z    P&amp;gt;|z|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
sepneg#female |
       No#No  |          .  (not estimable)
      No#Yes  |          .  (not estimable)
      Yes#No  |          .  (not estimable)
     Yes#Yes  |          .  (not estimable)
-------------------------------------------------------------------------------

. lincom 1.sepneg#1.female - 0.096

 ( 1)  1o.sepneg#1o.female = .096

------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         (1) |      -.096          .        .       .            .           .
------------------------------------------------------------------------------

end of do-file

I read that sometimes this is the case when there are empty cells, but this does not seem to be the case:

Code:

. tab female sepneg if samplefe==1

           |   Dummy: 1-3 years
           |  prior to separation
    Female |        No        Yes |     Total
-----------+----------------------+----------
        No |       657        127 |       784
       Yes |       828        165 |       993
-----------+----------------------+----------
     Total |     1,485        292 |     1,777

So, not sure what the problem would be. Do you have any suggestion?

Thanks so much for your patience. I realise that the issues/questions in this last post go beyond the scope of my initial inquiry. I hope this is fine.

Thanks again!

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30174
#7

05 Apr 2019, 12:43

It yields the same results, so I assume that this is another possible way:

Yes it is, and it will, as you have noticed, give you the same results. The reason I prefer the longer way here is that when you come back and look at this code and output after some time has passed, it will be immediately apparent that what you are doing here is comparing males to females. If you use the shorter version, you will need to go back over the earlier output and figure out what that coefficient meant in the context of the regression.

Further, I would like to test whether there are substantial differences between my marital dissolution process stages (separately for men and women). As the coefficients for sepneg, sepy2, divy1 and divy2 provide the estimates for men, I guess I can just test them against each other as follows:
Code:
test 1.sepneg = 1.sepy2 ( 1) 1.sepneg - 1.sepy2 = 0 F( 1, 16756) = 3.65 Prob > F = 0.0561
For women, I am however not 100% if I have really understood it properly. As the effect for women is defined through men's effect plus the difference between men and women (e.g. 1.sepneg + 1.sepneg#1.female), I think I would have to consider this. Is that right? So should it be:
Code:
test 1.sepneg+1.sepneg#1.female = 1.sepy2+1.sepy2#1.female ( 1) 1.sepneg - 1.sepy2 + 1.sepneg#1.female - 1.sepy2#1.female = 0 F( 1, 16756) = 10.45 Prob > F = 0.0012

Well, yes and no. If you can really justify comparing the coefficient of one predictor to that of another predictor, then, yes, this code is the correct way to do it. But that whole enterprise is usually tenuous at best. Differences in the coefficients of two variables do not necessarily represent differences in the strength of their association to the outcome. They can arise just due to differences in the distributions of the two predictors themselves. And in most situations, different predictors in a model will have different distributions, so the attempt to say which one is more "important" or more "strongly associated" with the outcome are simply not possible. You can run that code, but the answers will not mean what you want them to mean.

In any case, I don't know what the variables sepneg, sepy2, divy1 and divy2 are, so I can't really say anything specific about how you are using them here.

Regarding the margins command, Stata does not seem to be able to store the regression results appropriately (or I am doing something wrong). If I run your suggested margins command after the regression I receive the following:

Sorry, that was my error. You are correct in noting that this same kind of non-result happens when there are empty cells, but that is not what is going on here. Many of the parameters that -margins- would otherwise estimate are not identifiable in a fixed-effects model. Stata checks for this and produces the "(not estimable)" output you received. The -margins- outputs that are requested by that command are, in fact, unidentifiable. But, the -lincom- that you are trying to compute from them is identifiabale: the results will come out the same regardless of how the fixed-effects model is parameterized. So, you can force -margins- to give you output by adding the -noestimcheck- option to your -margins- command. Do remember that the particular -margins- output you get will not be correct, nor even meaningful. But the -lincom- that follows it will run properly and that result is identifiable and properly estimated by this process, even though the intermediate results you see from -margins- are not.
Comment

Announcement