IV-FD with an endogenous binary regressor

Maria Franco

Join Date: Jul 2014

Posts: 6
#1

IV-FD with an endogenous binary regressor

03 Jul 2018, 06:36

Hi Statalisters,

I would like to estimate the following model for individual i in period t:

Y_it = B₀ + B₁X_1it + B₂X_1it*X_2it + B₃X_3it + B₄X_4it + F_i + U_it

where Y is a continuous variable; X₁ is an endogenous binary variable (correlated with both the time-variant and time-invariant components of the error term); X₂, X₃ and X₄ are “exogenous” variables (correlated with F_i but not with U_it) – where X₂ and X₃ are continuous and X₄ is binary–; and F_i are individual fixed effects. I am interested in the causal effect of X₁ and X₁*X₂ on Y, and intend to use Z (a vector of 32 variables) as instruments for X₁ and X₁*X₂. Specifically, I will use 2SLS with First Differences to account for endogeneity arising from time-variant and time-invariant heterogeneity. In order to do this, I will use a panel dataset that consists of 1,777 individuals (i=1...1777) during 2 time periods (t=1,2). However, I'd like to flag some key limitations of my data:
Z only varies across individuals and not over time (I only have values for t=1). Still it seems to be a relevant set of instruments when this model is estimated (ignoring F_i) in cross-sections of the data (either for t=1 or t=2).

X₂ is missing whenever X₁=0, which is the reason why I did not include it in levels as an additional covariate. Instead, I included X₃, which accounts for a similar characteristic and for which I have values for all the sample. In order to avoid this interaction being dropped for observations with missing values, I imputed values of X₂ where X₁=0 with 0. Any suggestion about how to better deal with this issue is more than welcome.

My main question is: I'd like to follow the procedure suggested by Wooldridge (2002) "Econometrics of Cross Section and Panel Data" to deal with a binary endogenous regressor, considering that it has the additional benefit of "squeezing" the variation of a large amount of instruments into a single one (the fitted value). However, I'm not sure how to go about it given this setting (after taking first differences, X₁ is not binary anymore). Here goes my attempt, using Stata 14:

Code:

/* The original dataset has a wide structure, where each observation corresponds to individual i and variables with the suffix _? correspond to period t.*/ reshape long Y_@ X1_@ X2_@ X3_@ X4_@, i(id) j(time) ren *_ * xtset id time gen X1X2 = X1*X2 probit D.(X1 X3 X4) Z*, vce(cluster id) // Am I ditching information since -1 and 1 are considered the same by -probit-? I cannot use either -xtprobit- (without the lag operator) because of the incidental parameters problem or -xtlogit, fe- because Z doesn't vary over time. // Also, given that X4 is a dummy, should I create dummies for the different combinations instead of just using the lag operator? predict phat1 // This creates predicted probabilities for observations both in t=1 and t=2. Thus, now I have time-variant instruments. Is this a valid procedure? gen phat2 = phat1*X2 xtivreg2 Y X3 X4 (X1 X1X2 = phat1 phat2), fd first r

Many thanks for your help.

Best wishes,

Maria
Tags: None
Phil Bromiley

Join Date: Apr 2014

Posts: 4348
#2

05 Jul 2018, 13:13

You didn't get a quick answer. You'll increase your chances of a useful answer by following the FAQ on asking questions - provide Stata code in code delimiters, readable Stata output, and sample data using dataex. You will also increase your chances of a response by offering a shorter, more focused posting.

Generally 2sls is consistent with a binary endogenous variable. When you start to do this with your ad hoc approach, it is likely to create problems. If you have 3 real outcomes for the variable, then you're probably making a mistake to treat them as two. Your predicted value won't even match the range of the original variable. If fd is problematic, then why not use fe?
Comment

Announcement

IV-FD with an endogenous binary regressor

Comment