Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Time variable issue when trying a difference in difference approach

    Good day

    I want to perform a difference in difference regression defined as

    Code:
    mi estimate, esampvaryok: reg job_hours treatment survey survey#treatment, robust
    treatment is a self created binary variable that seems to be performing correctly

    However for some reason my time variable is causing issues as can be seen by the following output


    Code:
    . mi estimate, esampvaryok: reg job_hours treatment survey survey#treatment, robust
    
    Multiple-imputation estimates                     Imputations     =          5
    Linear regression                                 Number of obs   =         47
                                                      Average RVI     =     0.1641
                                                      Largest FMI     =     0.3495
                                                      Complete DF     =         43
    DF adjustment:   Small sample                     DF:     min     =      16.96
                                                              avg     =      30.94
                                                              max     =      39.37
    Model F test:       Equal FMI                     F(   3,   32.3) =       1.60
    Within VCE type:       Robust                     Prob > F        =     0.2078
    
    ----------------------------------------------------------------------------------
           job_hours |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -----------------+----------------------------------------------------------------
           treatment |  -7.204348   10.75615    -0.67   0.512    -29.90156    15.49287
              survey |  -6.755652   3.470882    -1.95   0.059     -13.7762    .2648965
                     |
    survey#treatment |
                1 1  |  -2.855652   17.76597    -0.16   0.873    -39.22317    33.51187
                2 0  |          0  (omitted)
                2 1  |          0  (omitted)
                     |
               _cons |   50.41565   5.211848     9.67   0.000     39.87688    60.95442
    ----------------------------------------------------------------------------------
    Survey is coded as 1 or 2 for the two respective waves. However, for some reason it will not perform the interaction and I do not understand why. There is variation in all variables used. I have used survey for other things in my code, so why won't it use the second wave for the regression and instead omits it? Is it an operator error? Because if I use two "##" instead of one (and shorten my equation properly) to do the following equation

    Code:
    mi estimate, esampvaryok: reg job_hours survey##treatment, robust
    I get the following output

    Code:
     . mi estimate, esampvaryok: reg job_hours survey##treatment, robust
    
    Multiple-imputation estimates                     Imputations     =          5
    Linear regression                                 Number of obs   =         47
                                                      Average RVI     =     0.1641
                                                      Largest FMI     =     0.1763
                                                      Complete DF     =         43
    DF adjustment:   Small sample                     DF:     min     =      28.42
                                                              avg     =      34.41
                                                              max     =      39.55
    Model F test:       Equal FMI                     F(   3,   32.3) =       1.60
    Within VCE type:       Robust                     Prob > F        =     0.2078
    
    ----------------------------------------------------------------------------------
           job_hours |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -----------------+----------------------------------------------------------------
            2.survey |  -6.755652   3.470882    -1.95   0.059     -13.7762    .2648965
         1.treatment |     -10.06   14.64265    -0.69   0.497    -39.93717    19.81717
                     |
    survey#treatment |
                2 1  |   2.855652   17.76597     0.16   0.873    -33.51187    39.22317
                     |
               _cons |      43.66   2.246678    19.43   0.000     39.11769    48.20231
    ----------------------------------------------------------------------------------


    Here I get the interaction term, so is this now correct? It is basically the same as the first output, only that the interaction term is now positive instead of negative, but almost all other values are identical or very similar. So is this now a properly done diff in diff interaction?


    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte job_hours float treatment byte survey
     . . 2
     . . 2
     . . 2
    13 . 2
     . . 2
     . . 2
    46 0 1
    42 0 2
    46 0 1
    42 0 2
    46 0 1
    42 0 2
    46 0 1
    42 0 2
    46 0 1
    42 0 2
    46 0 1
    42 0 2
     . 1 1
     . 1 2
    51 1 1
    35 1 2
    37 1 1
    35 1 2
    55 1 1
    58 1 2
    65 1 1
    36 1 2
    48 1 1
    38 1 2
    38 0 1
    40 0 2
    38 0 1
    40 0 2
    38 0 1
    40 0 2
    38 0 1
    40 0 2
    38 0 1
    40 0 2
    38 0 1
    40 0 2
     . 0 1
     . 0 2
     . 0 1
     . 0 2
     . 0 1
     . 0 2
     . 0 1
     . 0 2
     . 0 1
     . 0 2
     . 0 1
     . 0 2
    41 0 1
    40 0 2
    41 0 1
    40 0 2
    41 0 1
    40 0 2
    41 0 1
    40 0 2
    41 0 1
    40 0 2
    41 0 1
    40 0 2
     0 . 1
     0 . 1
     0 . 1
     0 . 1
     0 . 1
     0 . 1
    31 . 1
    31 . 1
    31 . 1
    31 . 1
    31 . 1
    31 . 1
    52 . 1
    52 . 1
    52 . 1
    52 . 1
    52 . 1
    52 . 1
    38 . 1
    38 . 1
    38 . 1
    38 . 1
    38 . 1
    38 . 1
    38 0 1
    45 0 2
    38 0 1
    45 0 2
    38 0 1
    45 0 2
    38 0 1
    45 0 2
    38 0 1
    45 0 2
    end
    Last edited by Oscar Weinzettl; 15 Aug 2019, 05:37.

  • #2
    It's because you're forgetting a few subtle points in factor-variable notation.

    When a variable is given without a c. or i. prefix, it is, by default, continuous, unless it is mentioned in an interaction, in which case it is, by default, discrete. Your regression equation lists treatment and survey by themselves with no prefix--hence they are continuous. But it also lists treatment#survey, which Stata takes to be an interaction between two discrete variables. Stata does not recognize that the continuous variables treatment and survey that appear by themselves are the same variables as the discrete variables appearing in treatment#survey. So it thinks there is a colinearity that has to be resolved.

    When you use the ## operator, there is no confusion because there are no appearances of continuous versions of treatment and survey. I recommend always using ## for interactions. It avoids confusions like this, and it also prevents you from accidentally omitting one of the constituents from the model. I would reserve the # operator for use in other commands such as -lincom- or -test- where you might be referring to specific individual levels of the interaction term.

    Alternatively, if you have some reason you want to explicitly mention treatment and survey separately, prefix them with i. and everything will go fine.

    Code:
    regress i.treatment i.survey treatment#survey
    gives the same results as -regress treatment##survey-.

    Comment


    • #3
      Thank you very much Clyde for clearing that up for me!

      Comment

      Working...
      X