Hi Statalisters,
I would like to estimate the following model for individual i in period t:
Yit = B0 + B1X1it + B2X1it*X2it + B3X3it + B4X4it + Fi + Uit
where Y is a continuous variable; X1 is an endogenous binary variable (correlated with both the time-variant and time-invariant components of the error term); X2, X3 and X4 are “exogenous” variables (correlated with Fi but not with Uit) – where X2 and X3 are continuous and X4 is binary–; and Fi are individual fixed effects. I am interested in the causal effect of X1 and X1*X2 on Y, and intend to use Z (a vector of 32 variables) as instruments for X1 and X1*X2. Specifically, I will use 2SLS with First Differences to account for endogeneity arising from time-variant and time-invariant heterogeneity. In order to do this, I will use a panel dataset that consists of 1,777 individuals (i=1...1777) during 2 time periods (t=1,2). However, I'd like to flag some key limitations of my data:
Many thanks for your help.
Best wishes,
Maria
I would like to estimate the following model for individual i in period t:
Yit = B0 + B1X1it + B2X1it*X2it + B3X3it + B4X4it + Fi + Uit
where Y is a continuous variable; X1 is an endogenous binary variable (correlated with both the time-variant and time-invariant components of the error term); X2, X3 and X4 are “exogenous” variables (correlated with Fi but not with Uit) – where X2 and X3 are continuous and X4 is binary–; and Fi are individual fixed effects. I am interested in the causal effect of X1 and X1*X2 on Y, and intend to use Z (a vector of 32 variables) as instruments for X1 and X1*X2. Specifically, I will use 2SLS with First Differences to account for endogeneity arising from time-variant and time-invariant heterogeneity. In order to do this, I will use a panel dataset that consists of 1,777 individuals (i=1...1777) during 2 time periods (t=1,2). However, I'd like to flag some key limitations of my data:
- Z only varies across individuals and not over time (I only have values for t=1). Still it seems to be a relevant set of instruments when this model is estimated (ignoring Fi) in cross-sections of the data (either for t=1 or t=2).
- X2 is missing whenever X1=0, which is the reason why I did not include it in levels as an additional covariate. Instead, I included X3, which accounts for a similar characteristic and for which I have values for all the sample. In order to avoid this interaction being dropped for observations with missing values, I imputed values of X2 where X1=0 with 0. Any suggestion about how to better deal with this issue is more than welcome.
Code:
/* The original dataset has a wide structure, where each observation corresponds to individual i and variables with the suffix _? correspond to period t.*/ reshape long Y_@ X1_@ X2_@ X3_@ X4_@, i(id) j(time) ren *_ * xtset id time gen X1X2 = X1*X2 probit D.(X1 X3 X4) Z*, vce(cluster id) // Am I ditching information since -1 and 1 are considered the same by -probit-? I cannot use either -xtprobit- (without the lag operator) because of the incidental parameters problem or -xtlogit, fe- because Z doesn't vary over time. // Also, given that X4 is a dummy, should I create dummies for the different combinations instead of just using the lag operator? predict phat1 // This creates predicted probabilities for observations both in t=1 and t=2. Thus, now I have time-variant instruments. Is this a valid procedure? gen phat2 = phat1*X2 xtivreg2 Y X3 X4 (X1 X1X2 = phat1 phat2), fd first r
Best wishes,
Maria
Comment