Missing values and sample selection

Abbie Malyon

Join Date: Feb 2018

Posts: 3
#1

Missing values and sample selection

23 Feb 2018, 07:22

I have data on 300,000 graduates and am running an ordered probit regression to see how degree class achieved relates to various social background characteristics. There are many missing values within my dataset and I am looking to test whether the probability of a value being missing can be predicted by my main variables, i.e. are people more likely to report a missing value if they are attending a less selective university. I am unsure on how to look for such sample selection within stata.

Thank you.
Tags: missing value, ordered probit, probit, sample selection
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#2

23 Feb 2018, 08:29

Create new 0/1 variables indicating missing values. For example:

Code:

gen sex_missing = missing(sex) gen age_missing = missing(age)

etc. Then you can look for associations between those new variables and whatever variable identifies the selectivity of their university using some appropriate model (perhaps a cross tab, or a regression of same kind--you don't say how you operationalized "selective unviersity.")
Comment
Abbie Malyon

Join Date: Feb 2018

Posts: 3
#3

27 Feb 2018, 09:23

Hi Clyde,

Thank you for your response.

Within my ordered probit model I have various dummy variables representing certain characteristics, i.e. social class, ethnicity etc and have also created three dummy variables for university selectiveness; "oxbridge", "selective" and "other". Would it be appropriate to regress say missing(social class) on all the relevant variables in my model?

Thank you,

Abigail
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#4

27 Feb 2018, 09:29

Yes.
Comment

Announcement

Missing values and sample selection

Comment

Comment

Comment