Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bigger sample or omitted variable bias?

    I am testing whether financial access is correlated with the education of young individuals using a longitudinal panel.

    Among the controls I am using there is parents' education that is a determinant of child's education. Nevertheless, there are a lot of missing, thus adding it I drastically reduce my sample but with still a significant number of observations (more or less from 11,000 to 1,700). To remove it would increase the sample and gives me more significant results. I also include household income as control, that in part captures the information relative to parents' education since they are partially correlated.

    My question is whether is better to keep the control losing most of the sample or to remove it with the risk of omitted variable bias.

  • #2
    Ottavia:
    First, I would investigate whether or not your regression model suffers from endogeneity, as individual's ability can well be correlated with both educational attainments (other things being equal, soft skills can favour better educational attainments) and the regressand financial access (other things being equal, soft skills can obtain better financial conditions).
    If your complete case analysis can include about 15.5% of the overall observations, your findings are basically unreliable, as they are based on a subgroup of the original sample and probably has a little to do with it.
    Multiple imputation (provided that data are missing completely at random or at random) may be a fix, but most of its success depends on its feasibility given the seemingly large number of missing data in your original sample.
    Kind regards,
    Carlo
    (Stata 18.0 SE)

    Comment


    • #3
      Thank you for your advice, Carlo!
      I already checked that financial access is exogenous, I will do it for parent's education too.
      I think I will opt for excluding the control because the missings are not randomly distributed and thus the two samples are significantly different between each other in almost all the variables of my regression.

      Comment


      • #4
        Ottavia:
        I would investigate (by skimming through the literature in your research field) whether or not excluding the controls is defensible in your working paper/article.
        In addition, excluding from your analysis variables with data missing not at random will, in all likelihood, bias your results.
        Kind regards,
        Carlo
        (Stata 18.0 SE)

        Comment

        Working...
        X