Fixed effect difference-in-differences model

Katherine Adams

Join Date: Jan 2019

Posts: 52
#46

13 Jan 2019, 08:29

Clyde, thank you again…

As you recommended, I draw the scatter plot for ‘lconsum’ (the vertical line) and ‘heatscore’ (the horizontal line). This is what I obtained:

twoway scatter lconsum heatscore

twoway scatter lconsum heatscore if calday < td(02feb2018)

It is hard to tell the kind of the relationship between these variables.

Last edited by Katherine Adams; 13 Jan 2019, 08:32.
Comment
Katherine Adams

Join Date: Jan 2019

Posts: 52
#47

13 Jan 2019, 08:33

Carlo, yes, I totally agree (#42).
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30150
#48

13 Jan 2019, 11:56

I think Carlo's response in #45 may well account for much of the disciplinary difference. I would add another possible source: randomized studies are uncommon in economics, but common in health care. In a randomized study, the independence of the error terms from the fixed effects is a very reasonable assumption, and under those circumstances the random effects model coefficient estimates are consistent.
1 like
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30150
#49

13 Jan 2019, 11:59

Re #46, because you graph has large numbers of points superimposed on it, I agree it is difficult to see what is going on here. One suggestion that might help is to try

Code:

lowess lconsum heatscore if calday < td(02feb2018)

The lowess curve will probably be more informative.
Comment
Katherine Adams

Join Date: Jan 2019

Posts: 52
#50

21 Jan 2019, 15:22

Clyde, thank you for your help! Unfortunately, I was not able to see the result of the 'lowess' command since it took too much time to run it.

However, now I have another quick question. Suppose I do not use the full factorial in the following code (the code is the same as that in your reply #39; I just use 'areg' instead of 'xtreg', and I also use different names for some of the vars):

areg lconsum i.randomgr##i.tp i.month c.calmonth, absorb(location) vce(cluster location)

So, my new code will be:

areg lconsum i.randomgr#i.tp i.month c.calmonth, absorb(location) vce(cluster location)

What effect will this change have on my estimates?

In both cases, I will have some omitted variables because of collinearity, which is OK (in the first case with ##, I will have i.randomgrp omitted; in the second case with #, I will have i.randomgrp#i.tp omitted). So, what is the difference?

Last edited by Katherine Adams; 21 Jan 2019, 15:25.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30150
#51

21 Jan 2019, 17:11

In your data, randomgrp is constant within location, and tp (I assume this is your new name for post_intervention) is constant within time (c.calmonth). So both of these variables will be omitted automatically by Stata. If you change the code from ## to #, as you propose, you are simply doing that work for Stata. There is no real harm in that: everything will come out the same. But I don't recommend it for two reasons.

1. It is better to get in the habit of always specifying interactions with ## and not #, because when you use # it is all to easy to forget to also include the constituent effects. Any model that includes an interaction without also including the constituent effects is mis-specified, unless those constituent effects are omitted due to colinearity (as here). While it is perfectly OK to go to # when the colinearity is present, it is much too easy to mistakenly omit one of the constituent effects by accident in circumstances where colinearity is not present. So the use of ## is foolproof; # is not.

2. Even though no damage is caused by using # instead of ##, when you know that the data design creates colinearity between constituents of the interaction and other model variables, when you read your output, if you see that the expected omissions did not occur, then you know that there is an error in your data. It is far better to find this out now than after you have blundered along farther in your analysis plan and invested time in creating spurious results. The sooner you discover problems with the data, the better. So ## provides you the added benefit of giving you a validity check on one aspect of your data.

You mentioned that i.randomgrp#i.tp gets omitted. You really don't want that to happen, as that is the key variable for interpreting your results to answer your research question. (The results are equivalent either way, and, in principle, it would still be possible to recover the i.randomgrp#i.tp effect, but that calculation involves a bunch of matrix algebra that is easy to get wrong.) It means that i.randomgrp##i.tp is colinear with something else in the model. What you should do is eliminate that something else instead. The simplest way to do that is to rearrange the order in which the variables are listed in the regression command. Stata chooses to break colinearities based on the order of appearance of the variables, so if you don't like what it chose, rearranging will solve that.
Comment
Harold Looney

Join Date: Jan 2019

Posts: 4
#52

22 Jan 2019, 01:58

Dear all,

I am joining this list to ask some advice as well. I have recently asked a question on Statalist which is related to this thread. Unfortunately I have got no response yet. I am new to Statalist, so I am still learning how to post properly. I apologize if my post is not appropriate under this conversation. I would be very glad if any of you could maybe have a look at my post as well. Thanks.

Harold
Comment
Katherine Adams

Join Date: Jan 2019

Posts: 52
#53

29 Jan 2019, 15:54

I have another question (I hope it will be among the last ones) about my model (see post #38).

So, I have the following regression model:
xtreg lconsum i.heatscore##i.randomgr##i.tp i.month c.daymntemp##c.daymntemp c.calmonth, fe vce(cluster location)

Now, I need to modify it; in particular, my ‘new’ model should have month and household fixed effects, as well as the calendar month time trend. So, my code is now as follows:
reghdfelconsum i.heatscore##i.randomgr##i.tp c.calmonth, absorb(location month) vce(cluster location)

Also, I was asked to add month-of-sample, week-of-sample, and month by household fixed effects. Could you please tell me how I can do this? And, if I add these fixed effects, should I drop the month fixed effect I had before?

Thank you.

P.S. I am also struggling with an event study for the model (https://www.statalist.org/forums/for...study-analysis) - I will appreciate any help.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17727
#54

31 Jan 2019, 00:28

Katherine:
- you have too many interactions in your model(s); I guess that collinearity would be an issue with all those time variables and disseminating your results difficult.
Can't you simplify things a bit?

Kind regards,
Carlo
(Stata 19.0)
Comment
Katherine Adams

Join Date: Jan 2019

Posts: 52
#55

31 Jan 2019, 10:17

Carlo,

Yes, I should have done this earlier…

I have panel data for 2017-2018. It is a RCT. The treatment (started on February 2, 2018) is actually a specific type of a bill sent to a household which includes a comparison between a household’s energy use and its neighbors. It is expected that the treatment will reduce the energy use of treated households.

Suppose, my simplified diff-in-diff model is:
xtreg lconsum i.randomgr##i.tp, fe vce(cluster location)

Now, I need to modify it; in particular, my ‘new’ model should have household fixed effects, as well as the calendar month time trend. So, my code is now as follows:
reghdfe lconsum i.randomgr##i.tp c.calmonth, absorb(location) vce(cluster location)

Then, I was asked to add month-of-sample, week-of-sample, and month by household fixed effects. How can I do this?

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input long location str9 date float(year month day calday calmonth lconsum) byte randomgrp float tp 500001 "01-JAN-17" 2017 1 1 20820 684 4.331219 0 0 500001 "02-JAN-17" 2017 1 2 20821 684 4.395176 0 0 500001 "03-JAN-17" 2017 1 3 20822 684 4.4484995 0 0 500001 "04-JAN-17" 2017 1 4 20823 684 4.4349075 0 0 500001 "05-JAN-17" 2017 1 5 20824 684 4.3300653 0 0 500001 "06-JAN-17" 2017 1 6 20825 684 3.984616 0 0 500001 "07-JAN-17" 2017 1 7 20826 684 4.2140527 0 0 500001 "08-JAN-17" 2017 1 8 20827 684 4.4064745 0 0 500001 "09-JAN-17" 2017 1 9 20828 684 4.2368575 0 0 500001 "10-JAN-17" 2017 1 10 20829 684 4.3243986 0 0 end format %td calday format %tm calmonth

Variables:
location; household’s location id
date
year
month
day
calday; day and year 01jan2017
calmonth; month and year 2017m1
lconsum; log of energy consumption
randomgr; treatment indicator; one of three treatment groups (can be 0,1,2,3)
tp; post-treatment variable; gen tp = (calday >= td(02feb2018))

P.S. A difficulty with an event study for this model...
https://www.statalist.org/forums/for...study-analysis

Last edited by Katherine Adams; 31 Jan 2019, 10:19.
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17727

#56

31 Jan 2019, 10:27

Katherine:
I think that a different specification of your model is mandatory:

Code:

. reghdfe lconsum i.randomgr##i.tp c.calmonth, absorb(location) vce(cluster location)
(converged in 1 iterations)
note: 0.randomgrp omitted because of collinearity
note: 0.tp omitted because of collinearity
note: 0.randomgrp#0.tp omitted because of collinearity
note: calmonth omitted because of collinearity

HDFE Linear regression                            Number of obs   =         10
Absorbing 1 HDFE group                            F(   0,      0) =       0.00
Statistics robust to heteroskedasticity           Prob > F        =          .
                                                  R-squared       =    -0.0000
                                                  Adj R-squared   =    -0.0000
                                                  Within R-sq.    =     0.0000
Number of clusters (location) =          1        Root MSE        =     0.1386

                               (Std. Err. adjusted for 1 clusters in location)
------------------------------------------------------------------------------
             |               Robust
     lconsum |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
 0.randomgrp |          0  (empty)
        0.tp |          0  (empty)
             |
randomgrp#tp |
        0 0  |          0  (empty)
             |
    calmonth |          0  (omitted)
------------------------------------------------------------------------------

Absorbed degrees of freedom:
---------------------------------------------------------------+
 Absorbed FE |  Num. Coefs.  =   Categories  -   Redundant     |
-------------+-------------------------------------------------|
    location |            0               1              1 *   |
---------------------------------------------------------------+
* = fixed effect nested within cluster; treated as redundant for DoF computation

.

Technically, you can add the other time-variable as predictors via -fvvarlist- notation (ie, -i-. prefix).
But Im afraid that they will only worsen the correlation issue.

Kind regards,
Carlo
(Stata 19.0)

Comment

Katherine Adams

Join Date: Jan 2019
Posts: 52

#57

31 Jan 2019, 11:10

Carlo,

I generated an alterative treatment measure:
gen treatalt = (calday >= td(02feb2018) & randomgrp>0)

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long location str9 date float(year month day calday calmonth lconsum) byte randomgrp float(tp treatalt)
500001 "01-JAN-17" 2017 1  1 20820 684  4.322219 0 0 0
500001 "02-JAN-17" 2017 1  2 20821 684  4.386176 0 0 0
500001 "03-JAN-17" 2017 1  3 20822 684 4.4473995 0 0 0
500001 "04-JAN-17" 2017 1  4 20823 684 4.4338075 0 0 0
500001 "05-JAN-17" 2017 1  5 20824 684 4.3310753 0 0 0
500001 "06-JAN-17" 2017 1  6 20825 684  3.974716 0 0 0
500001 "07-JAN-17" 2017 1  7 20826 684 4.2140517 0 0 0
500001 "08-JAN-17" 2017 1  8 20827 684 4.4054755 0 0 0
500001 "09-JAN-17" 2017 1  9 20828 684 4.2358565 0 0 0
500001 "10-JAN-17" 2017 1 10 20829 684 4.3244976 0 0 0
end
format %td calday
format %tm calmonth

and ran the following regression:
reghdfe lconsum treatalt tp c.calmonth, absorb(location) vce(cluster location)

It should work this time...

Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17727

#58

31 Jan 2019, 11:26

Katherine:
I'm afraid that the answer is, again, no:

Code:

. reghdfe lconsum treatalt tp c.calmonth, absorb(location) vce(cluster location)
(converged in 1 iterations)
note: treatalt omitted because of collinearity
note: tp omitted because of collinearity
note: calmonth omitted because of collinearity

HDFE Linear regression                            Number of obs   =         10
Absorbing 1 HDFE group                            F(   0,      0) =       0.00
Statistics robust to heteroskedasticity           Prob > F        =          .
                                                  R-squared       =     0.0000
                                                  Adj R-squared   =     0.0000
                                                  Within R-sq.    =     0.0000
Number of clusters (location) =          1        Root MSE        =     0.1402

                               (Std. Err. adjusted for 1 clusters in location)
------------------------------------------------------------------------------
             |               Robust
     lconsum |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    treatalt |          0  (omitted)
          tp |          0  (omitted)
    calmonth |          0  (omitted)
------------------------------------------------------------------------------

Absorbed degrees of freedom:
---------------------------------------------------------------+
 Absorbed FE |  Num. Coefs.  =   Categories  -   Redundant     |
-------------+-------------------------------------------------|
    location |            0               1              1 *   |
---------------------------------------------------------------+
* = fixed effect nested within cluster; treated as redundant for DoF computation

.

However, if Stata gave you back something better using the full sample, please share it with the list. Thanks.

Kind regards,
Carlo
(Stata 19.0)

Comment

Katherine Adams

Join Date: Jan 2019

Posts: 52
#59

31 Jan 2019, 11:31

Strange... It works well on my data (the full dataset, I mean):

Attached Files
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17727
#60

31 Jan 2019, 11:43

Katherine:
now it sounds good (on the whole sample, I mean).

Kind regards,
Carlo
(Stata 19.0)
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment