Dear all,
Following a paper doing this, I compare the subsample I created after dropping observations with missing values and dropping observations based on some other restrictions I chose, to the full sample, by running a probit regression where the outcome varibale is "Included = 1 if the observation is included in the subsample, and 0 otherwise".
I compute then the average marginal effects, and found statistically significant differences in control variables.
For example, I find that included individuals in the subsample are 5 percentage points more likely to have lower birth weight.
Thank you.
Following a paper doing this, I compare the subsample I created after dropping observations with missing values and dropping observations based on some other restrictions I chose, to the full sample, by running a probit regression where the outcome varibale is "Included = 1 if the observation is included in the subsample, and 0 otherwise".
I compute then the average marginal effects, and found statistically significant differences in control variables.
For example, I find that included individuals in the subsample are 5 percentage points more likely to have lower birth weight.
- Is this problematic?
- Can I just say that differences are small in magnitude, so we should not have an issue? (since 5 pp is not much?)
- If this is problematic, and I just use the subsample to run my equations, can I just discuss how the problem of the subsample will bias the results? (e.g. if I have a negative coefficient for the birth weight, shoudl I say that the estimate is biased downwards?)
- If this is problematic, how can I correct for this? I saw a post talking about weights, is this the way to proceed?
- Suppose I also find a sgnificant difference for my main outcome variable (income), how to discuss this?
Thank you.
Comment