Interpretation of the interaction between categorical variable and dummy variable

Marry Lee

Join Date: Nov 2020
Posts: 189

Interpretation of the interaction between categorical variable and dummy variable

17 Sep 2024, 04:49

Dear all,

I have panel data with 5 waves.

I have a dummy variable called "policy" that = 1 if the policy is applied and = 0 if the policy is not applied
and a categorical variable called "wave" that = 1, 2, 3, 4 and 5 denoting the survey wave.

In wave 1: policy = 1 for all individuals
In wave 2: policy = 1 for some and 0 for some others
In wave 3: policy = 0 for all individuals
In wave 4: policy = 1 for some and 0 for some others
In wave 5: policy = 1 for some and 0 for some others

I want to study the effect of policy on outcome Y in each wave, as follows:

Code:

Y i.policy#i.wave

The result I got:

Code:

xtreg Y i.wave#i.policy, fe vce(robust)
note: 1b.wave#0b.policy identifies no observations in the sample.
note: 3.wave#1.policy identifies no observations in the sample.
note: 5.wave#1.policy omitted because of collinearity.

Fixed-effects (within) regression               Number of obs     =      6,040
Group variable: n_id                            Number of groups  =      3,254

R-squared:                                      Obs per group:
     Within  = 0.0048                                         min =          1
     Between = 0.0037                                         avg =        1.9
     Overall = 0.0042                                         max =          5

                                                F(7, 3253)        =       2.11
corr(u_i, Xb) = 0.0296                          Prob > F          =     0.0393

                                 (Std. err. adjusted for 3,254 clusters in n_id)
--------------------------------------------------------------------------------
               |               Robust
        Y        | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
---------------+----------------------------------------------------------------
wave#policy|
          1 0  |          0  (empty)
          1 1  |  -.0076616   .0059744    -1.28   0.200    -.0193757    .0040524
          2 0  |   .0014931   .0056733     0.26   0.792    -.0096304    .0126166
          2 1  |   -.005056   .0085084    -0.59   0.552    -.0217383    .0116264
          3 0  |   .0044151   .0054968     0.80   0.422    -.0063625    .0151927
          3 1  |          0  (empty)
          4 0  |   .0161157   .0069952     2.30   0.021     .0024003    .0298311
          4 1  |   -.006329   .0056444    -1.12   0.262    -.0173959     .004738
          5 0  |  -.0020953   .0099434    -0.21   0.833    -.0215913    .0174007
          5 1  |          0  (omitted)
               |
         _cons |    .744132   .0035799   207.86   0.000     .7371129    .7511511
---------------+----------------------------------------------------------------
       sigma_u |  .15802072
       sigma_e |  .09463006
           rho |  .73604293   (fraction of variance due to u_i)
--------------------------------------------------------------------------------

Here is a data sample:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float(n_id wave policy Y)
13 1 1   .
13 2 0   .
13 3 0 .87
13 5 0 .87
14 2 1  .8
15 4 1   .
16 5 .   .
17 1 1 .72
17 2 0 .87
17 3 0   .
18 1 .   .
19 3 0 .72
19 5 1  .8
20 1 1 .87
21 2 0 .77
22 1 1 .84
24 1 1 .52
25 5 1 .24
26 5 .   .
27 5 1  .1
29 4 1 .95
29 5 0   .
30 3 .   .
31 2 0  .8
32 1 1 .95
34 5 1 .87
35 5 0 .41
37 2 .   .
39 1 1 .77
39 2 0 .84
39 3 0 .69
39 4 0 .69
39 5 1 .69
40 3 0   .
41 2 0 .77
43 1 1 .77
44 2 0 .77
44 4 1 .87
45 2 0 .72
46 1 1 .41
46 3 0  .5
47 3 0 .77
48 2 0   .
48 3 0 .41
48 4 1 .32
48 5 0   .
50 2 0 .58
50 3 0 .66
50 4 1   .
50 5 1   .
51 5 1 .72
52 5 1  .5
53 1 1 .87
53 2 0 .87
53 3 0   .
53 4 0 .87
53 5 1 .87
54 2 0  .9
54 3 0   .
54 4 1 .87
54 5 1 .87
55 1 1   .
55 4 0   .
56 1 1 .87
57 5 0 .69
58 2 0 .87
58 5 1 .95
59 3 0 .77
60 5 1 .87
61 3 .   .
63 5 0 .61
64 2 0   .
66 5 0 .55
68 4 1 .87
68 5 1 .95
69 1 1 .61
69 3 0   .
70 4 1 .84
70 5 0   .
71 1 1   .
71 3 0 .61
71 4 1 .61
72 2 .   .
73 1 1 .77
73 2 0 .61
73 3 0 .61
74 1 1  .8
74 2 0 .87
74 3 0 .55
75 4 1 .94
76 1 1 .61
77 2 0   .
78 3 0   .
end

My question is about the interpretation of these results:

If we change the omitted option, results would change: coefficient and standard errors: is the omitted option my reference group?
Is this model correct?

Thank you.

Tags: None

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17706
#2

17 Sep 2024, 06:43

Marry:
I would advise you to go:

Code:

Y i.policy##i.wave

and post back your results.

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment

Marry Lee

Join Date: Nov 2020
Posts: 189

17 Sep 2024, 06:49

Dear Carlo Lazzaro, here is the outcome of the new model:

Code:

 xtreg Y i.wave##i.policy, fe vce(robust) baselevels
note: 1b.wave#0b.policy identifies no observations in the sample.
note: 3.wave#1.policy identifies no observations in the sample.
note: 5.wave#1.policy omitted because of collinearity.

Fixed-effects (within) regression               Number of obs     =      6,040
Group variable: n_id                            Number of groups  =      3,254

R-squared:                                      Obs per group:
     Within  = 0.0048                                         min =          1
     Between = 0.0037                                         avg =        1.9
     Overall = 0.0042                                         max =          5

                                                F(7, 3253)        =       2.11
corr(u_i, Xb) = 0.0296                          Prob > F          =     0.0393

                                 (Std. err. adjusted for 3,254 clusters in n_id)
--------------------------------------------------------------------------------
               |               Robust
        Y       | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
---------------+----------------------------------------------------------------
         wave |
            1  |          0  (base)
            2  |   .0112501   .0117288     0.96   0.338    -.0117465    .0342466
            3  |   .0141721   .0116202     1.22   0.223    -.0086116    .0369557
            4  |   .0258727   .0118801     2.18   0.029     .0025794    .0491659
            5  |   .0076616   .0059744     1.28   0.200    -.0040524    .0193757
               |
      policy  |
            0  |          0  (base)
            1  |   .0020953   .0099434     0.21   0.833    -.0174007    .0215913
               |
wave#policy|
          1 0  |          0  (empty)
          2 1  |  -.0086444    .012713    -0.68   0.497    -.0335706    .0162818
          3 1  |          0  (empty)
          4 1  |    -.02454   .0116818    -2.10   0.036    -.0474445   -.0016355
          5 1  |          0  (omitted)
               |
         _cons |   .7343751   .0106485    68.97   0.000     .7134967    .7552535
---------------+----------------------------------------------------------------
       sigma_u |  .15802072
       sigma_e |  .09463006
           rho |  .73604293   (fraction of variance due to u_i)
--------------------------------------------------------------------------------

.

Comment

Joseph Coveney

Join Date: Apr 2014

Posts: 4401
#4

17 Sep 2024, 07:04

Originally posted by Marry Lee View Post

In wave 1: policy = 1 for all individuals
. . .
In wave 3: policy = 0 for all individuals
. . .

I want to study the effect of policy on outcome Y in each wave

You won't be able to do that, not quite, at least not in each wave, because you have confounding of wave and application of policy for Waves 1 and 3.

is the omitted option my reference group?

Yes. The "(omitted)" is your reference group and the "(empty)" are the conditions without any data in your dataset.

Is this model correct?

Well, what you have constructed with your interaction term is in essence a so-called cell-means model. Selected linear contrasts between them (i.e., contrasts involving wave and policy-application combinations that are present in your dataset) are the best that you're going to be able to do.
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17706

17 Sep 2024, 07:10

Marry:
1) yes, the omitted category is your reference group, that you can change via -fvvarlist-;
2) your regression results are as informative as two predictors (and their interaction) can be. You can check the mispecification of the functional form of the regressand by replicating by hand the same procedure reported in the -linktest- ebtry, Stata .pdf manual, as in the following toy-example:

Code:

. use "https://www.stata-press.com/data/r18/nlswork.dta"
(National Longitudinal Survey of Young Women, 14-24 years old in 1968)

. xtreg ln_wage c.age##c.age i.year, fe vce(cluster idcode)

Fixed-effects (within) regression               Number of obs     =     28,510
Group variable: idcode                          Number of groups  =      4,710

R-squared:                                      Obs per group:
     Within  = 0.1162                                         min =          1
     Between = 0.1078                                         avg =        6.1
     Overall = 0.0932                                         max =         15

                                                F(16, 4709)       =      79.11
corr(u_i, Xb) = 0.0613                          Prob > F          =     0.0000

                             (Std. err. adjusted for 4,710 clusters in idcode)
------------------------------------------------------------------------------
             |               Robust
     ln_wage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
         age |   .0728746    .013687     5.32   0.000     .0460416    .0997075
             |
 c.age#c.age |  -.0010113   .0001076    -9.40   0.000    -.0012224   -.0008003
             |
        year |
         69  |   .0647054   .0155249     4.17   0.000     .0342693    .0951415
         70  |   .0284423   .0264639     1.07   0.283    -.0234395     .080324
         71  |   .0579959   .0384111     1.51   0.131    -.0173078    .1332996
         72  |   .0510671   .0502675     1.02   0.310    -.0474808     .149615
         73  |   .0424104   .0624924     0.68   0.497    -.0801038    .1649247
         75  |   .0151376    .086228     0.18   0.861    -.1539096    .1841848
         77  |   .0340933   .1106841     0.31   0.758    -.1828994     .251086
         78  |   .0537334   .1232232     0.44   0.663    -.1878417    .2953084
         80  |   .0369475   .1473725     0.25   0.802    -.2519716    .3258667
         82  |   .0391687   .1715621     0.23   0.819    -.2971733    .3755108
         83  |    .058766   .1836086     0.32   0.749    -.3011928    .4187249
         85  |   .1042758   .2080199     0.50   0.616    -.3035406    .5120922
         87  |   .1242272   .2327328     0.53   0.594    -.3320379    .5804922
         88  |   .1904977   .2486083     0.77   0.444    -.2968909    .6778863
             |
       _cons |   .3937532   .2469015     1.59   0.111    -.0902893    .8777957
-------------+----------------------------------------------------------------
     sigma_u |  .40275174
     sigma_e |  .30127563
         rho |  .64120306   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. predict fitted, xb
(24 missing values generated)

. g sq_fitted=fitted^2
(24 missing values generated)

. xtreg ln_wage fitted sq_fitted , fe vce(cluster idcode)

Fixed-effects (within) regression               Number of obs     =     28,510
Group variable: idcode                          Number of groups  =      4,710

R-squared:                                      Obs per group:
     Within  = 0.1164                                         min =          1
     Between = 0.1094                                         avg =        6.1
     Overall = 0.0941                                         max =         15

                                                F(2, 4709)        =     586.29
corr(u_i, Xb) = 0.0619                          Prob > F          =     0.0000

                             (Std. err. adjusted for 4,710 clusters in idcode)
------------------------------------------------------------------------------
             |               Robust
     ln_wage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
      fitted |   2.012332   .5365254     3.75   0.000     .9604909    3.064172
   sq_fitted |  -.3040363   .1616996    -1.88   0.060    -.6210431    .0129706
       _cons |  -.8379964    .443929    -1.89   0.059    -1.708305    .0323122
-------------+----------------------------------------------------------------
     sigma_u |  .40239556
     sigma_e |  .30114591
         rho |  .64099409   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. test sq_fitted

 ( 1)  sq_fitted = 0

       F(  1,  4709) =    3.54
            Prob > F =    0.0601

.

As -test- outcome does not reject the null, there's no evidence of model misspecification.

Kind regards,
Carlo
(Stata 19.0)

Comment

Marry Lee

Join Date: Nov 2020

Posts: 189
#6

17 Sep 2024, 07:21

Thank you so much Joseph Coveney for your answer. I am starting to understand the model a little bit, but I am still confused about what Stata is doing.

In particular, I cannot understand the difference between "omitted" and "base". If "omitted" is my reference group then what is "base"?

Is every coefficient computed as a difference with respect to the omitted or the base or both values?

For example in #3, should I say the effect of the policy in wave 4 is lower than the effect of the policy in wave 5 ? or should I say, the outcome is lower under the policy in wave 4 compared to the outcome when there was no policy in wave1?

Thank you.
Comment

Marry Lee

Join Date: Nov 2020
Posts: 189

17 Sep 2024, 07:36

Carlo Lazzaro Thank you so much for your answer.

How can I change the omitted option please?
I tried this to make policy=0 and wave=3 as the omitted option, but Stata still omitted the same option as before:

Code:

xtreg Y ib3.wave##ib0.policy, fe vce(robust) baselevels

Edit: sorry, it is not the same: the base changed to these values but the omitted option did not change, here is the new results:

Code:

 xtreg Y ib3.wave##ib0.policy, fe vce(robust) baselevels
note: 1.wave#0b.policy identifies no observations in the sample.
note: 1.wave#1.policy omitted because of collinearity.
note: 3b.wave#1.policy identifies no observations in the sample.
note: 5.wave#1.policy omitted because of collinearity.

Fixed-effects (within) regression               Number of obs     =      6,040
Group variable: n_id                            Number of groups  =      3,254

R-squared:                                      Obs per group:
     Within  = 0.0048                                         min =          1
     Between = 0.0037                                         avg =        1.9
     Overall = 0.0042                                         max =          5

                                                F(7, 3253)        =       2.11
corr(u_i, Xb) = 0.0296                          Prob > F          =     0.0393

                                 (Std. err. adjusted for 3,254 clusters in n_id)
--------------------------------------------------------------------------------
               |               Robust
        Y     | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
---------------+----------------------------------------------------------------
         wave |
            1  |  -.0141721   .0116202    -1.22   0.223    -.0369557    .0086116
            2  |   -.002922   .0056105    -0.52   0.603    -.0139225    .0080785
            3  |          0  (base)
            4  |   .0117006   .0071128     1.64   0.100    -.0022455    .0256467
            5  |  -.0065104   .0095288    -0.68   0.495    -.0251936    .0121727
               |
      policy |
            0  |          0  (base)
            1  |   .0020953   .0099434     0.21   0.833    -.0174007    .0215913
               |
wave#policy|
          1 0  |          0  (empty)
          1 1  |          0  (omitted)
          2 1  |  -.0086444    .012713    -0.68   0.497    -.0335706    .0162818
          3 1  |          0  (empty)
          4 1  |    -.02454   .0116818    -2.10   0.036    -.0474445   -.0016355
          5 1  |          0  (omitted)
               |
         _cons |   .7485472   .0035391   211.51   0.000     .7416081    .7554862
---------------+----------------------------------------------------------------
       sigma_u |  .15802072
       sigma_e |  .09463006
           rho |  .73604293   (fraction of variance due to u_i)
--------------------------------------------------------------------------------

Edit 2: I think I am a bit lost now. What is the difference between a model with one # and a model with ## ? In the model with two ##, I cannot see the effect of the policy in wave 1 while in the model with one #, I can see that. So I would prefer the model with one # since it gives more information. Am I wrong about something?

Last edited by Marry Lee; 17 Sep 2024, 07:44.

Comment

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17706
#8

17 Sep 2024, 09:14

Marry:
1) the omitted option is collinear with your fixed effect. It's a matter of linear algebra and there's nithing you can do about that.
2) see https://www.statalist.org/forums/for...actorial-anova

Kind regards,
Carlo
(Stata 19.0)
Comment
Marry Lee

Join Date: Nov 2020

Posts: 189
#9

17 Sep 2024, 10:23

Carlo Lazzaro Thank you for your answer. I understand how things should work now. But one last issue that I couldn't get, if you can help me please.

If I have omitted options and a base, how should I interpret the results?i s the reference the omitted or the base value? or both?

Thank you.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17706
#10

17 Sep 2024, 10:51

Marry:
it's the base.

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment

Announcement

Interpretation of the interaction between categorical variable and dummy variable

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment