I am investigating the impact of Latino students' English skills upon entry to university on their attainment upon completion. In Colombia, students sit standardised university entry and exit exams, both including an English, maths and Spanish language component. I have a dataset of 130,000 students across 383 universities.
Using a common identifier, I matched students in my exit-exam cohort of interest to their entry scores, this resulted in a loss of observations (the matching success rate is 35.1%). This is because the testing agency didn't manage to produce common keys for all students, it's not an issue with my methods of matching per se.
I want to address the potential selection bias that this introduces. The issue I face is that the treatment (the matching to entry scores) happens after the outcome variable is observed (exit exam scores). I am considering different ways of addressing this, but I am unsure of how appropriate they are in my case.
- Propensity Score Matching, Inverse Probability Weighting. However, my understanding is that I can't do this because treatment occurs after the outcome variable is observed.
- Heckman Selection Model. I would delete the exit scores for those students who aren't matched to their entry scores, treating them as if they were missing, as suggested in this thread.
Any thoughts on the validity of these approaches or alternative methods would be greatly appreciated.
Thank you in advance!
Using a common identifier, I matched students in my exit-exam cohort of interest to their entry scores, this resulted in a loss of observations (the matching success rate is 35.1%). This is because the testing agency didn't manage to produce common keys for all students, it's not an issue with my methods of matching per se.
I want to address the potential selection bias that this introduces. The issue I face is that the treatment (the matching to entry scores) happens after the outcome variable is observed (exit exam scores). I am considering different ways of addressing this, but I am unsure of how appropriate they are in my case.
- Propensity Score Matching, Inverse Probability Weighting. However, my understanding is that I can't do this because treatment occurs after the outcome variable is observed.
- Heckman Selection Model. I would delete the exit scores for those students who aren't matched to their entry scores, treating them as if they were missing, as suggested in this thread.
Any thoughts on the validity of these approaches or alternative methods would be greatly appreciated.
Thank you in advance!
Comment