Selection bias

Upeksha Diss

Join Date: Nov 2022

Posts: 6
#1

Selection bias

22 Mar 2025, 15:11

Hi everyone,

I'm a beginner in econometrics and I'm working on a study examining the association between CO₂ exposure and infant mortality using a pooled cross-sectional dataset, please advise me if my question is flawed..
As my first study,
I run a regression using reghdfe with fixed effects (assuming the exogeneity of CO₂ exposure) to estimate infant mortality. This stage includes several controls and fixed effects (e.g., country or year) to account for unobserved heterogeneity. I receive significant results in this.
As the second study,
I then regress CO₂ exposure on the weight-for-age of children. Here, I assume that the unobservables (error term) from the infant mortality regression drive the selection of children (i.e., only survivors are observed in the second stage).
As I understand, by conditioning on survival, the sample for the second study is selected in a non random manner.
I attempted to address this using a Heckman selection model, but I'm finding it extremely difficult to construct a valid instrument that affects survival (the selection process) without directly affecting weight-for-age.
Are there alternative methods or strategies you can recommend to address or rationalize selection bias in this context especially when a valid instrument for the Heckman model is hard to come by?
Is there any way to mathematically model the bias and show how much my estimates shift due to bias..
I’d appreciate any insights, alternative suggestions, or relevant literature that could help me move forward.
Thanks in advance for your help!
Tags: None
George Ford

Join Date: Aug 2014

Posts: 3123
#2

22 Mar 2025, 15:17

Manski Bounds, maybe.
Comment
Upeksha Diss

Join Date: Nov 2022

Posts: 6
#3

22 Mar 2025, 15:41

Prof. Ford Thank you so much for the suggestion on Manski Bounds.
I will check on this for more information.
Anymore insights are also appreciated if there are more along the way
Comment
George Ford

Join Date: Aug 2014

Posts: 3123
#4

23 Mar 2025, 10:03

Might look to regional variations in Hospitals per sqmi or NICU beds per capita as IVs. These likely effect early results but not weight/age over time.

but https://pmc.ncbi.nlm.nih.gov/articles/PMC10867701/.

Drug use/opiod deaths maybe.

SUID seems to vary a bit. https://www.cdc.gov/sudden-infant-de...-by-state.html

Last edited by George Ford; 23 Mar 2025, 10:09.
Comment
Upeksha Diss

Join Date: Nov 2022

Posts: 6
#5

23 Mar 2025, 18:28

Thank you, Prof. Ford.
Your suggestion is incredibly helpful!
I truly appreciate the idea of using regional variations in hospitals per square mile and NICU beds per capita as IVs, along with the excellent references. Since this work is based in Africa where data availability can be a bit limited. I’ll look for the data and thank you again

Last edited by Upeksha Diss; 23 Mar 2025, 18:36.
Comment

Announcement

Comment

Comment

Comment

Comment