Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Interpretation of the interaction between categorical variable and dummy variable

    Dear all,

    I have panel data with 5 waves.

    I have a dummy variable called "policy" that = 1 if the policy is applied and = 0 if the policy is not applied
    and a categorical variable called "wave" that = 1, 2, 3, 4 and 5 denoting the survey wave.

    In wave 1: policy = 1 for all individuals
    In wave 2: policy = 1 for some and 0 for some others
    In wave 3: policy = 0 for all individuals
    In wave 4: policy = 1 for some and 0 for some others
    In wave 5: policy = 1 for some and 0 for some others

    I want to study the effect of policy on outcome Y in each wave, as follows:
    Code:
    Y i.policy#i.wave
    The result I got:
    Code:
    xtreg Y i.wave#i.policy, fe vce(robust)
    note: 1b.wave#0b.policy identifies no observations in the sample.
    note: 3.wave#1.policy identifies no observations in the sample.
    note: 5.wave#1.policy omitted because of collinearity.
    
    Fixed-effects (within) regression               Number of obs     =      6,040
    Group variable: n_id                            Number of groups  =      3,254
    
    R-squared:                                      Obs per group:
         Within  = 0.0048                                         min =          1
         Between = 0.0037                                         avg =        1.9
         Overall = 0.0042                                         max =          5
    
                                                    F(7, 3253)        =       2.11
    corr(u_i, Xb) = 0.0296                          Prob > F          =     0.0393
    
                                     (Std. err. adjusted for 3,254 clusters in n_id)
    --------------------------------------------------------------------------------
                   |               Robust
            Y        | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
    ---------------+----------------------------------------------------------------
    wave#policy|
              1 0  |          0  (empty)
              1 1  |  -.0076616   .0059744    -1.28   0.200    -.0193757    .0040524
              2 0  |   .0014931   .0056733     0.26   0.792    -.0096304    .0126166
              2 1  |   -.005056   .0085084    -0.59   0.552    -.0217383    .0116264
              3 0  |   .0044151   .0054968     0.80   0.422    -.0063625    .0151927
              3 1  |          0  (empty)
              4 0  |   .0161157   .0069952     2.30   0.021     .0024003    .0298311
              4 1  |   -.006329   .0056444    -1.12   0.262    -.0173959     .004738
              5 0  |  -.0020953   .0099434    -0.21   0.833    -.0215913    .0174007
              5 1  |          0  (omitted)
                   |
             _cons |    .744132   .0035799   207.86   0.000     .7371129    .7511511
    ---------------+----------------------------------------------------------------
           sigma_u |  .15802072
           sigma_e |  .09463006
               rho |  .73604293   (fraction of variance due to u_i)
    --------------------------------------------------------------------------------
    Here is a data sample:
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(n_id wave policy Y)
    13 1 1   .
    13 2 0   .
    13 3 0 .87
    13 5 0 .87
    14 2 1  .8
    15 4 1   .
    16 5 .   .
    17 1 1 .72
    17 2 0 .87
    17 3 0   .
    18 1 .   .
    19 3 0 .72
    19 5 1  .8
    20 1 1 .87
    21 2 0 .77
    22 1 1 .84
    24 1 1 .52
    25 5 1 .24
    26 5 .   .
    27 5 1  .1
    29 4 1 .95
    29 5 0   .
    30 3 .   .
    31 2 0  .8
    32 1 1 .95
    34 5 1 .87
    35 5 0 .41
    37 2 .   .
    39 1 1 .77
    39 2 0 .84
    39 3 0 .69
    39 4 0 .69
    39 5 1 .69
    40 3 0   .
    41 2 0 .77
    43 1 1 .77
    44 2 0 .77
    44 4 1 .87
    45 2 0 .72
    46 1 1 .41
    46 3 0  .5
    47 3 0 .77
    48 2 0   .
    48 3 0 .41
    48 4 1 .32
    48 5 0   .
    50 2 0 .58
    50 3 0 .66
    50 4 1   .
    50 5 1   .
    51 5 1 .72
    52 5 1  .5
    53 1 1 .87
    53 2 0 .87
    53 3 0   .
    53 4 0 .87
    53 5 1 .87
    54 2 0  .9
    54 3 0   .
    54 4 1 .87
    54 5 1 .87
    55 1 1   .
    55 4 0   .
    56 1 1 .87
    57 5 0 .69
    58 2 0 .87
    58 5 1 .95
    59 3 0 .77
    60 5 1 .87
    61 3 .   .
    63 5 0 .61
    64 2 0   .
    66 5 0 .55
    68 4 1 .87
    68 5 1 .95
    69 1 1 .61
    69 3 0   .
    70 4 1 .84
    70 5 0   .
    71 1 1   .
    71 3 0 .61
    71 4 1 .61
    72 2 .   .
    73 1 1 .77
    73 2 0 .61
    73 3 0 .61
    74 1 1  .8
    74 2 0 .87
    74 3 0 .55
    75 4 1 .94
    76 1 1 .61
    77 2 0   .
    78 3 0   .
    end


    My question is about the interpretation of these results:
    • If we change the omitted option, results would change: coefficient and standard errors: is the omitted option my reference group?
    • Is this model correct?

    Thank you.

  • #2
    Marry:
    I would advise you to go:
    Code:
    Y i.policy##i.wave
    and post back your results.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Dear Carlo Lazzaro, here is the outcome of the new model:


      Code:
       xtreg Y i.wave##i.policy, fe vce(robust) baselevels
      note: 1b.wave#0b.policy identifies no observations in the sample.
      note: 3.wave#1.policy identifies no observations in the sample.
      note: 5.wave#1.policy omitted because of collinearity.
      
      Fixed-effects (within) regression               Number of obs     =      6,040
      Group variable: n_id                            Number of groups  =      3,254
      
      R-squared:                                      Obs per group:
           Within  = 0.0048                                         min =          1
           Between = 0.0037                                         avg =        1.9
           Overall = 0.0042                                         max =          5
      
                                                      F(7, 3253)        =       2.11
      corr(u_i, Xb) = 0.0296                          Prob > F          =     0.0393
      
                                       (Std. err. adjusted for 3,254 clusters in n_id)
      --------------------------------------------------------------------------------
                     |               Robust
              Y       | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
      ---------------+----------------------------------------------------------------
               wave |
                  1  |          0  (base)
                  2  |   .0112501   .0117288     0.96   0.338    -.0117465    .0342466
                  3  |   .0141721   .0116202     1.22   0.223    -.0086116    .0369557
                  4  |   .0258727   .0118801     2.18   0.029     .0025794    .0491659
                  5  |   .0076616   .0059744     1.28   0.200    -.0040524    .0193757
                     |
            policy  |
                  0  |          0  (base)
                  1  |   .0020953   .0099434     0.21   0.833    -.0174007    .0215913
                     |
      wave#policy|
                1 0  |          0  (empty)
                2 1  |  -.0086444    .012713    -0.68   0.497    -.0335706    .0162818
                3 1  |          0  (empty)
                4 1  |    -.02454   .0116818    -2.10   0.036    -.0474445   -.0016355
                5 1  |          0  (omitted)
                     |
               _cons |   .7343751   .0106485    68.97   0.000     .7134967    .7552535
      ---------------+----------------------------------------------------------------
             sigma_u |  .15802072
             sigma_e |  .09463006
                 rho |  .73604293   (fraction of variance due to u_i)
      --------------------------------------------------------------------------------
      
      .

      Comment


      • #4
        Originally posted by Marry Lee View Post
        In wave 1: policy = 1 for all individuals
        . . .
        In wave 3: policy = 0 for all individuals
        . . .

        I want to study the effect of policy on outcome Y in each wave
        You won't be able to do that, not quite, at least not in each wave, because you have confounding of wave and application of policy for Waves 1 and 3.

        is the omitted option my reference group?
        Yes. The "(omitted)" is your reference group and the "(empty)" are the conditions without any data in your dataset.

        Is this model correct?
        Well, what you have constructed with your interaction term is in essence a so-called cell-means model. Selected linear contrasts between them (i.e., contrasts involving wave and policy-application combinations that are present in your dataset) are the best that you're going to be able to do.

        Comment


        • #5
          Marry:
          1) yes, the omitted category is your reference group, that you can change via -fvvarlist-;
          2) your regression results are as informative as two predictors (and their interaction) can be. You can check the mispecification of the functional form of the regressand by replicating by hand the same procedure reported in the -linktest- ebtry, Stata .pdf manual, as in the following toy-example:
          Code:
          . use "https://www.stata-press.com/data/r18/nlswork.dta"
          (National Longitudinal Survey of Young Women, 14-24 years old in 1968)
          
          . xtreg ln_wage c.age##c.age i.year, fe vce(cluster idcode)
          
          Fixed-effects (within) regression               Number of obs     =     28,510
          Group variable: idcode                          Number of groups  =      4,710
          
          R-squared:                                      Obs per group:
               Within  = 0.1162                                         min =          1
               Between = 0.1078                                         avg =        6.1
               Overall = 0.0932                                         max =         15
          
                                                          F(16, 4709)       =      79.11
          corr(u_i, Xb) = 0.0613                          Prob > F          =     0.0000
          
                                       (Std. err. adjusted for 4,710 clusters in idcode)
          ------------------------------------------------------------------------------
                       |               Robust
               ln_wage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
          -------------+----------------------------------------------------------------
                   age |   .0728746    .013687     5.32   0.000     .0460416    .0997075
                       |
           c.age#c.age |  -.0010113   .0001076    -9.40   0.000    -.0012224   -.0008003
                       |
                  year |
                   69  |   .0647054   .0155249     4.17   0.000     .0342693    .0951415
                   70  |   .0284423   .0264639     1.07   0.283    -.0234395     .080324
                   71  |   .0579959   .0384111     1.51   0.131    -.0173078    .1332996
                   72  |   .0510671   .0502675     1.02   0.310    -.0474808     .149615
                   73  |   .0424104   .0624924     0.68   0.497    -.0801038    .1649247
                   75  |   .0151376    .086228     0.18   0.861    -.1539096    .1841848
                   77  |   .0340933   .1106841     0.31   0.758    -.1828994     .251086
                   78  |   .0537334   .1232232     0.44   0.663    -.1878417    .2953084
                   80  |   .0369475   .1473725     0.25   0.802    -.2519716    .3258667
                   82  |   .0391687   .1715621     0.23   0.819    -.2971733    .3755108
                   83  |    .058766   .1836086     0.32   0.749    -.3011928    .4187249
                   85  |   .1042758   .2080199     0.50   0.616    -.3035406    .5120922
                   87  |   .1242272   .2327328     0.53   0.594    -.3320379    .5804922
                   88  |   .1904977   .2486083     0.77   0.444    -.2968909    .6778863
                       |
                 _cons |   .3937532   .2469015     1.59   0.111    -.0902893    .8777957
          -------------+----------------------------------------------------------------
               sigma_u |  .40275174
               sigma_e |  .30127563
                   rho |  .64120306   (fraction of variance due to u_i)
          ------------------------------------------------------------------------------
          
          . predict fitted, xb
          (24 missing values generated)
          
          . g sq_fitted=fitted^2
          (24 missing values generated)
          
          . xtreg ln_wage fitted sq_fitted , fe vce(cluster idcode)
          
          Fixed-effects (within) regression               Number of obs     =     28,510
          Group variable: idcode                          Number of groups  =      4,710
          
          R-squared:                                      Obs per group:
               Within  = 0.1164                                         min =          1
               Between = 0.1094                                         avg =        6.1
               Overall = 0.0941                                         max =         15
          
                                                          F(2, 4709)        =     586.29
          corr(u_i, Xb) = 0.0619                          Prob > F          =     0.0000
          
                                       (Std. err. adjusted for 4,710 clusters in idcode)
          ------------------------------------------------------------------------------
                       |               Robust
               ln_wage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
          -------------+----------------------------------------------------------------
                fitted |   2.012332   .5365254     3.75   0.000     .9604909    3.064172
             sq_fitted |  -.3040363   .1616996    -1.88   0.060    -.6210431    .0129706
                 _cons |  -.8379964    .443929    -1.89   0.059    -1.708305    .0323122
          -------------+----------------------------------------------------------------
               sigma_u |  .40239556
               sigma_e |  .30114591
                   rho |  .64099409   (fraction of variance due to u_i)
          ------------------------------------------------------------------------------
          
          . test sq_fitted
          
           ( 1)  sq_fitted = 0
          
                 F(  1,  4709) =    3.54
                      Prob > F =    0.0601
          
          .
          As -test- outcome does not reject the null, there's no evidence of model misspecification.
          Kind regards,
          Carlo
          (Stata 19.0)

          Comment


          • #6
            Thank you so much Joseph Coveney for your answer. I am starting to understand the model a little bit, but I am still confused about what Stata is doing.

            In particular, I cannot understand the difference between "omitted" and "base". If "omitted" is my reference group then what is "base"?

            Is every coefficient computed as a difference with respect to the omitted or the base or both values?

            For example in #3, should I say the effect of the policy in wave 4 is lower than the effect of the policy in wave 5 ? or should I say, the outcome is lower under the policy in wave 4 compared to the outcome when there was no policy in wave1?

            Thank you.

            Comment


            • #7
              Carlo Lazzaro Thank you so much for your answer.

              How can I change the omitted option please?
              I tried this to make policy=0 and wave=3 as the omitted option, but Stata still omitted the same option as before:

              Code:
              xtreg Y ib3.wave##ib0.policy, fe vce(robust) baselevels

              Edit: sorry, it is not the same: the base changed to these values but the omitted option did not change, here is the new results:
              Code:
               xtreg Y ib3.wave##ib0.policy, fe vce(robust) baselevels
              note: 1.wave#0b.policy identifies no observations in the sample.
              note: 1.wave#1.policy omitted because of collinearity.
              note: 3b.wave#1.policy identifies no observations in the sample.
              note: 5.wave#1.policy omitted because of collinearity.
              
              Fixed-effects (within) regression               Number of obs     =      6,040
              Group variable: n_id                            Number of groups  =      3,254
              
              R-squared:                                      Obs per group:
                   Within  = 0.0048                                         min =          1
                   Between = 0.0037                                         avg =        1.9
                   Overall = 0.0042                                         max =          5
              
                                                              F(7, 3253)        =       2.11
              corr(u_i, Xb) = 0.0296                          Prob > F          =     0.0393
              
                                               (Std. err. adjusted for 3,254 clusters in n_id)
              --------------------------------------------------------------------------------
                             |               Robust
                      Y     | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
              ---------------+----------------------------------------------------------------
                       wave |
                          1  |  -.0141721   .0116202    -1.22   0.223    -.0369557    .0086116
                          2  |   -.002922   .0056105    -0.52   0.603    -.0139225    .0080785
                          3  |          0  (base)
                          4  |   .0117006   .0071128     1.64   0.100    -.0022455    .0256467
                          5  |  -.0065104   .0095288    -0.68   0.495    -.0251936    .0121727
                             |
                    policy |
                          0  |          0  (base)
                          1  |   .0020953   .0099434     0.21   0.833    -.0174007    .0215913
                             |
              wave#policy|
                        1 0  |          0  (empty)
                        1 1  |          0  (omitted)
                        2 1  |  -.0086444    .012713    -0.68   0.497    -.0335706    .0162818
                        3 1  |          0  (empty)
                        4 1  |    -.02454   .0116818    -2.10   0.036    -.0474445   -.0016355
                        5 1  |          0  (omitted)
                             |
                       _cons |   .7485472   .0035391   211.51   0.000     .7416081    .7554862
              ---------------+----------------------------------------------------------------
                     sigma_u |  .15802072
                     sigma_e |  .09463006
                         rho |  .73604293   (fraction of variance due to u_i)
              --------------------------------------------------------------------------------


              Edit 2: I think I am a bit lost now. What is the difference between a model with one # and a model with ## ? In the model with two ##, I cannot see the effect of the policy in wave 1 while in the model with one #, I can see that. So I would prefer the model with one # since it gives more information. Am I wrong about something?
              Last edited by Marry Lee; 17 Sep 2024, 07:44.

              Comment


              • #8
                Marry:
                1) the omitted option is collinear with your fixed effect. It's a matter of linear algebra and there's nithing you can do about that.
                2) see https://www.statalist.org/forums/for...actorial-anova
                Kind regards,
                Carlo
                (Stata 19.0)

                Comment


                • #9
                  Carlo Lazzaro Thank you for your answer. I understand how things should work now. But one last issue that I couldn't get, if you can help me please.

                  If I have omitted options and a base, how should I interpret the results?i s the reference the omitted or the base value? or both?

                  Thank you.

                  Comment


                  • #10
                    Marry:
                    it's the base.
                    Kind regards,
                    Carlo
                    (Stata 19.0)

                    Comment

                    Working...
                    X