IV with fixed effect for data which has not a traditional panel structure

Paula de Souza Leao Spinola

Join Date: Jun 2015

Posts: 384
#1

IV with fixed effect for data which has not a traditional panel structure

16 Nov 2019, 06:57

How can I run an IV model with fixed effect in a non-panel dataset? I know that in case of panel data, we would use -xtivreg2-.

I have a dataset where each row represents one childbirth delivery. I would like to estimate the effect of a municipality level policy which is believed to have affected a given variable X in different ways. Thus my key independent variable with be the interaction between X and a dummy for post-policy period. I would like to include municipality fixed effects as well as to cluster the standard errors at the municipality level.

If I would be to force a panel structure to my data, the panel variable would be each child born (which only shows up once in the dataset). Stata would not run the command:

Code:

. xtset id d_pos panel variable: id (weakly balanced) time variable: d_pos, 0 to 1 delta: 1 unit . . gen d_pos2 = d_pos . . xtivreg2 peso d_pos2 (d_pc = trat), fe cluster(mun) Warning - singleton groups detected. 6097656 observation(s) not used. no observations r(2000); end of do-file r(2000); .
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#2

16 Nov 2019, 07:53

Paula:
can't you simply go:

Code:

ivreg peso i.d_pos2 (d_pc=trat), vce(cluster mun)

Kind regards,
Carlo
(Stata 19.0)
Comment
Paula de Souza Leao Spinola

Join Date: Jun 2015

Posts: 384
#3

16 Nov 2019, 08:36

Hello Carlo Lazzaro. I imagine the factor operator i. was by mistake (d_pos2 is a dummy variable for post-policy period).

Would this code be considering municipality fixed effect? I imagine it is just clustering the standard errors at the municipality level.

In any case, this is not working. I don't know what I am doing wrong.

Code:

. ivreg peso d_pos2 (d_pc=trat), vce(cluster mun) option vce() not allowed r(198);
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#4

16 Nov 2019, 09:20

Paula:
if -dpos_2- is actually a categorical variable, what's wrong with -i.- -fvvarlist- prefix? (at worst , with a two-level categorical variable, it is redundant).
That said, yoou may want to try:

Code:

ivregress gmm peso i.municipality d_pos2 (d_pc=trat), vce(cluster municipality)

Caveat emptor: untested code (specifically, I'm not sure that -i.municipality- and related cluster robust standard error can live together).

Kind regards,
Carlo
(Stata 19.0)
Comment

Paula de Souza Leao Spinola

Join Date: Jun 2015
Posts: 384

16 Nov 2019, 14:43

Thanks Carlo. You are right, I should use the command -ivregress- instead.

Code:

. codebook mun

-------------------------------------------------------------------------------------------------------------------------------------------
mun hospital's municipality
-------------------------------------------------------------------------------------------------------------------------------------------

type: numeric (double)

range: [110001,530010] units: 1
unique values: 3,321 missing .: 0/6,194,096

mean: 325678
std. dev: 93675.2

percentiles: 10% 25% 50% 75% 90%
211070 261420 330360 355030 431490

.
. set matsize 3400

. ivregress 2sls peso d_pos i.mun (d_pc=trat), vce(cluster mun)
note: 210015.mun identifies no observations in the sample
note: 311890.mun identifies no observations in the sample
note: 355320.mun identifies no observations in the sample
maxvar too small
You have attempted to use an interaction with too many levels or attempted to fit a model with too many variables. You need to
increase maxvar; it is currently 5000. Use set maxvar; see help maxvar.

If you are using factor variables and included an interaction that has lots of missing cells, either increase maxvar or set
emptycells drop to reduce the required matrix size; see help set emptycells.

If you are using factor variables, you might have accidentally treated a continuous variable as a categorical, resulting in lots of
categories. Use the c. operator on such variables.
r(907);

end of do-file

r(907);

I then tried to -set maxvar- to its maximum and got the error message below:

Code:

. use SINASC_SIHpartos, clear

. 
. set maxvar 32767
no; data in memory would be lost
r(4);

end of do-file

r(4);

Comment

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#6

17 Nov 2019, 03:47

Paula:
what if you set -maxvar- before loading the dataset you're working on?

Kind regards,
Carlo
(Stata 19.0)
Comment
Paula de Souza Leao Spinola

Join Date: Jun 2015

Posts: 384
#7

17 Nov 2019, 09:43

Thanks Carlo!
You are right - I do not get an error message when setting -maxvar- before loading the data. The issue is that it has been taking forever for Stata to run the regression (It has been already 3 hours and Stata is still running). As I don't need the coefficient estimates for the municipality FE, Is there a way to absorb them? I was trying to do so when using -xtivreg2-. However, as mentioned above, this command does not work as I do not have panel data.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#8

17 Nov 2019, 11:31

Paula:
if you do not need -i.municipality- coefficient, you can try:

Code:

ivregress gmm peso d_pos2 (d_pc=trat), vce(cluster municipality)

Kind regards,
Carlo
(Stata 19.0)
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2158
#9

17 Nov 2019, 16:34

To be safe, I would compare it with using within deviations from means. First, make sure you only use the complete cases, which is best done by creating a complete cases indicator and only computing the within-municipality average for any variable if it is part of a complete case. You won't see "estimates" of the dummies, which is best, and it won't run up against space constraints. In my MIT Press book I show that this is how fixed effects IV estimation works.
Below is generic code. The endogenous variable is w and z1 ... zm are excluded instruments.

Code:

gen s = (y != .) & (w != .) (x1 != .) & ... & (xk != .) & (z1 != .) & ... & (zm != .) egen ybar = mean(y) if s, by(municipality) gen ydd = y - ybar egen wbar = mean(w) if s, by(municipality) gen wdd = w - wbar egen x1bar = mean(x1) if s, by(municipality) gen x1dd = x1 - x1bar ... egen xkbar = mean(xk) if s, by(municipality) gen xkdd = xk - xkbar egen z1bar = mean(z1) if s, by(municipality) gen z1dd = z1 - z1bar ... egen zmbar = mean(zm) if s, by(municipality) gen zmdd = zm - zmbar ivregress 2sls ydd x1dd ... xkdd (wdd = z1dd ... zmdd), vce(cluster municipality)
1 like
Comment
Paula de Souza Leao Spinola

Join Date: Jun 2015

Posts: 384
#10

09 Jul 2020, 13:50

Many thanks for your response Jeff Wooldridge!!! And sorry for just picking up on this now.
Would your code be equivalent to transforming the data to mean differences by using the command -xtdata- and then running the ivregress command on the transformed variables? Or would it differ in case there are incomplete observations? The -xtdata- alternative is recommended here: https://www.stata.com/support/faqs/s...ts-regression/

PS: Carlo Lazzaro, from what I understand, your suggestion on #8 would not be controlling for municipality FE, right? I would like to control for municipality's time-invariant factors even though I don't need to see the "estimates" of the municipalities' dummy.
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10190
#11

09 Jul 2020, 14:34

You may consider using ivreghdfe from SSC and do away with dummies or demeaning.

Code:

ssc install ivreghdfe help ivreghdfe
Comment

Announcement