Hi everyone,
I'm a beginner in econometrics and I'm working on a study examining the association between CO₂ exposure and infant mortality using a pooled cross-sectional dataset, please advise me if my question is flawed..
As my first study,
I run a regression using reghdfe with fixed effects (assuming the exogeneity of CO₂ exposure) to estimate infant mortality. This stage includes several controls and fixed effects (e.g., country or year) to account for unobserved heterogeneity. I receive significant results in this.
As the second study,
I then regress CO₂ exposure on the weight-for-age of children. Here, I assume that the unobservables (error term) from the infant mortality regression drive the selection of children (i.e., only survivors are observed in the second stage).
As I understand, by conditioning on survival, the sample for the second study is selected in a non random manner.
I attempted to address this using a Heckman selection model, but I'm finding it extremely difficult to construct a valid instrument that affects survival (the selection process) without directly affecting weight-for-age.
Are there alternative methods or strategies you can recommend to address or rationalize selection bias in this context especially when a valid instrument for the Heckman model is hard to come by?
Is there any way to mathematically model the bias and show how much my estimates shift due to bias..
I’d appreciate any insights, alternative suggestions, or relevant literature that could help me move forward.
Thanks in advance for your help!
I'm a beginner in econometrics and I'm working on a study examining the association between CO₂ exposure and infant mortality using a pooled cross-sectional dataset, please advise me if my question is flawed..
As my first study,
I run a regression using reghdfe with fixed effects (assuming the exogeneity of CO₂ exposure) to estimate infant mortality. This stage includes several controls and fixed effects (e.g., country or year) to account for unobserved heterogeneity. I receive significant results in this.
As the second study,
I then regress CO₂ exposure on the weight-for-age of children. Here, I assume that the unobservables (error term) from the infant mortality regression drive the selection of children (i.e., only survivors are observed in the second stage).
As I understand, by conditioning on survival, the sample for the second study is selected in a non random manner.
I attempted to address this using a Heckman selection model, but I'm finding it extremely difficult to construct a valid instrument that affects survival (the selection process) without directly affecting weight-for-age.
Are there alternative methods or strategies you can recommend to address or rationalize selection bias in this context especially when a valid instrument for the Heckman model is hard to come by?
Is there any way to mathematically model the bias and show how much my estimates shift due to bias..
I’d appreciate any insights, alternative suggestions, or relevant literature that could help me move forward.
Thanks in advance for your help!
Comment