Time variable issue when trying a difference in difference approach

Oscar Weinzettl

Join Date: Nov 2018
Posts: 70

Time variable issue when trying a difference in difference approach

15 Aug 2019, 05:07

Good day

I want to perform a difference in difference regression defined as

Code:

mi estimate, esampvaryok: reg job_hours treatment survey survey#treatment, robust

treatment is a self created binary variable that seems to be performing correctly

However for some reason my time variable is causing issues as can be seen by the following output

Code:

. mi estimate, esampvaryok: reg job_hours treatment survey survey#treatment, robust

Multiple-imputation estimates                     Imputations     =          5
Linear regression                                 Number of obs   =         47
                                                  Average RVI     =     0.1641
                                                  Largest FMI     =     0.3495
                                                  Complete DF     =         43
DF adjustment:   Small sample                     DF:     min     =      16.96
                                                          avg     =      30.94
                                                          max     =      39.37
Model F test:       Equal FMI                     F(   3,   32.3) =       1.60
Within VCE type:       Robust                     Prob > F        =     0.2078

----------------------------------------------------------------------------------
       job_hours |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-----------------+----------------------------------------------------------------
       treatment |  -7.204348   10.75615    -0.67   0.512    -29.90156    15.49287
          survey |  -6.755652   3.470882    -1.95   0.059     -13.7762    .2648965
                 |
survey#treatment |
            1 1  |  -2.855652   17.76597    -0.16   0.873    -39.22317    33.51187
            2 0  |          0  (omitted)
            2 1  |          0  (omitted)
                 |
           _cons |   50.41565   5.211848     9.67   0.000     39.87688    60.95442
----------------------------------------------------------------------------------

Survey is coded as 1 or 2 for the two respective waves. However, for some reason it will not perform the interaction and I do not understand why. There is variation in all variables used. I have used survey for other things in my code, so why won't it use the second wave for the regression and instead omits it? Is it an operator error? Because if I use two "##" instead of one (and shorten my equation properly) to do the following equation

Code:

mi estimate, esampvaryok: reg job_hours survey##treatment, robust

I get the following output

Code:

 . mi estimate, esampvaryok: reg job_hours survey##treatment, robust

Multiple-imputation estimates                     Imputations     =          5
Linear regression                                 Number of obs   =         47
                                                  Average RVI     =     0.1641
                                                  Largest FMI     =     0.1763
                                                  Complete DF     =         43
DF adjustment:   Small sample                     DF:     min     =      28.42
                                                          avg     =      34.41
                                                          max     =      39.55
Model F test:       Equal FMI                     F(   3,   32.3) =       1.60
Within VCE type:       Robust                     Prob > F        =     0.2078

----------------------------------------------------------------------------------
       job_hours |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-----------------+----------------------------------------------------------------
        2.survey |  -6.755652   3.470882    -1.95   0.059     -13.7762    .2648965
     1.treatment |     -10.06   14.64265    -0.69   0.497    -39.93717    19.81717
                 |
survey#treatment |
            2 1  |   2.855652   17.76597     0.16   0.873    -33.51187    39.22317
                 |
           _cons |      43.66   2.246678    19.43   0.000     39.11769    48.20231
----------------------------------------------------------------------------------

Here I get the interaction term, so is this now correct? It is basically the same as the first output, only that the interaction term is now positive instead of negative, but almost all other values are identical or very similar. So is this now a properly done diff in diff interaction?

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte job_hours float treatment byte survey
 . . 2
 . . 2
 . . 2
13 . 2
 . . 2
 . . 2
46 0 1
42 0 2
46 0 1
42 0 2
46 0 1
42 0 2
46 0 1
42 0 2
46 0 1
42 0 2
46 0 1
42 0 2
 . 1 1
 . 1 2
51 1 1
35 1 2
37 1 1
35 1 2
55 1 1
58 1 2
65 1 1
36 1 2
48 1 1
38 1 2
38 0 1
40 0 2
38 0 1
40 0 2
38 0 1
40 0 2
38 0 1
40 0 2
38 0 1
40 0 2
38 0 1
40 0 2
 . 0 1
 . 0 2
 . 0 1
 . 0 2
 . 0 1
 . 0 2
 . 0 1
 . 0 2
 . 0 1
 . 0 2
 . 0 1
 . 0 2
41 0 1
40 0 2
41 0 1
40 0 2
41 0 1
40 0 2
41 0 1
40 0 2
41 0 1
40 0 2
41 0 1
40 0 2
 0 . 1
 0 . 1
 0 . 1
 0 . 1
 0 . 1
 0 . 1
31 . 1
31 . 1
31 . 1
31 . 1
31 . 1
31 . 1
52 . 1
52 . 1
52 . 1
52 . 1
52 . 1
52 . 1
38 . 1
38 . 1
38 . 1
38 . 1
38 . 1
38 . 1
38 0 1
45 0 2
38 0 1
45 0 2
38 0 1
45 0 2
38 0 1
45 0 2
38 0 1
45 0 2
end

Last edited by Oscar Weinzettl; 15 Aug 2019, 05:37.

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30116
#2

15 Aug 2019, 05:45

It's because you're forgetting a few subtle points in factor-variable notation.

When a variable is given without a c. or i. prefix, it is, by default, continuous, unless it is mentioned in an interaction, in which case it is, by default, discrete. Your regression equation lists treatment and survey by themselves with no prefix--hence they are continuous. But it also lists treatment#survey, which Stata takes to be an interaction between two discrete variables. Stata does not recognize that the continuous variables treatment and survey that appear by themselves are the same variables as the discrete variables appearing in treatment#survey. So it thinks there is a colinearity that has to be resolved.

When you use the ## operator, there is no confusion because there are no appearances of continuous versions of treatment and survey. I recommend always using ## for interactions. It avoids confusions like this, and it also prevents you from accidentally omitting one of the constituents from the model. I would reserve the # operator for use in other commands such as -lincom- or -test- where you might be referring to specific individual levels of the interaction term.

Alternatively, if you have some reason you want to explicitly mention treatment and survey separately, prefix them with i. and everything will go fine.

Code:

regress i.treatment i.survey treatment#survey

gives the same results as -regress treatment##survey-.
Comment
Oscar Weinzettl

Join Date: Nov 2018

Posts: 70
#3

15 Aug 2019, 05:53

Thank you very much Clyde for clearing that up for me!
Comment

Announcement

Time variable issue when trying a difference in difference approach

Comment

Comment