Hi Statalist,
This is a statistics question as well as a Stata question, but it would be really helpful to know whether the model I want to fit can be implemented in Stata. Apologies in advance for the length. I'm trying to be as precise as possible.
I want to estimate the following OLS regression model:
(1) Y_ijc = a + B1X_ijc + B2G_jc + B3X_ijc*G_jc + DZ_ijc + d + r + e_ijc
where i, j, c are individuals, years, and countries respectively, XZ_ijc are a set of controls, and d and r are country and year fixed effects. The coefficients of interest are the B coefficients.
In stata terms:
So far, so simple. However, the variable G_jc is itself an estimated beta coefficient from a set of regression models fit to overlapping data. Specifically:
(2) X_i = p + GE_i for j = 1 ... N and c = 1 ... N
or in stata terms:
However, I know that there is noise from my estimates of G_jc that are not incorporated into my target model (1). So what I want to know is, how would I go about modelling these two processes jointly such that error in estimates of G_jc correctly propagate into (1) above? What stata command should I use? Also X and E are available for many more observations than Y, I want to estimate G using all available data to maximise precision.
I'm aware of various stata commands for fitting multistage/ multiequation models such as ivregress, heckman, and sem/gsem. However, as far as I'm aware none of them can take as their input to the second stage of the regression a coefficient estimated in the first stage.
Thanks in advance!
Dan
This is a statistics question as well as a Stata question, but it would be really helpful to know whether the model I want to fit can be implemented in Stata. Apologies in advance for the length. I'm trying to be as precise as possible.
I want to estimate the following OLS regression model:
(1) Y_ijc = a + B1X_ijc + B2G_jc + B3X_ijc*G_jc + DZ_ijc + d + r + e_ijc
where i, j, c are individuals, years, and countries respectively, XZ_ijc are a set of controls, and d and r are country and year fixed effects. The coefficients of interest are the B coefficients.
In stata terms:
Code:
reg Y X##c.G i.d i.r [some specification for the Z controls]
(2) X_i = p + GE_i for j = 1 ... N and c = 1 ... N
or in stata terms:
Code:
levelsof d, local(d)
levelsof r, local(r)
gen G = .
foreach c of local d {
foreach j of local r {
reg X E if d == `c' & r == `j'
replace G = _b[E] if d == `c' & r == `j'
}
}
However, I know that there is noise from my estimates of G_jc that are not incorporated into my target model (1). So what I want to know is, how would I go about modelling these two processes jointly such that error in estimates of G_jc correctly propagate into (1) above? What stata command should I use? Also X and E are available for many more observations than Y, I want to estimate G using all available data to maximise precision.
I'm aware of various stata commands for fitting multistage/ multiequation models such as ivregress, heckman, and sem/gsem. However, as far as I'm aware none of them can take as their input to the second stage of the regression a coefficient estimated in the first stage.
Thanks in advance!
Dan
