Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • omitted variables and the interpretation of 2SLS

    Hi, I must say that I am a beginner in Stata. My analysis is about the effect of students'scores on decision-making in higher education from 2004-2006 (the independent variable and outcome variable are a dummy: whether pupils got good marks during in high school and the decision whether they decide to study further). I have some questions about omitted variable in 2sls such as parents' ethnicity which is separated into White, Black, Asian and other, employment status: unemployed, employed and other. When I run 2sls, I put only 3 ethnicity- Black, Asian and other but Black and Asian are omitted themselves and the similar case with employment status. However, this problem appear inly in 2005 but no omitted variables in 2004 and 2006.



    . ivregress 2sls applyforUni sibs exclude bullied agemum agedad ethnicYP2 ethnicYP3 ethnicYP4 ethnicmum2 ethnicmum3 ethnicmum4 ethnicdad
    > 2 ethnicdad3 ethnicdad4 employmum2 employmum3 employdad2 employdad3 HHincome2 HHincome3 qualimum2 qualimum3 qualimum4 qualidad2 qualid
    > ad3 qualidad4 ( gotgoodmarks = playtruant ) if year==2005 & sex==1, first
    note: ethnicmum4 omitted because of collinearity
    note: ethnicdad2 omitted because of collinearity
    note: ethnicdad4 omitted because of collinearity
    note: employmum3 omitted because of collinearity
    note: employdad2 omitted because of collinearity
    note: employdad3 omitted because of collinearity
    note: qualimum3 omitted because of collinearity

    First-stage regressions
    -----------------------

    Number of obs = 44
    F( 20, 23) = 1.23
    Prob > F = 0.3161
    R-squared = 0.5162
    Adj R-squared = 0.0955
    Root MSE = 0.3053

    ------------------------------------------------------------------------------
    gotgoodmarks | Coef. Std. Err. t P>|t| [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    sibs | -.0091035 .066297 -0.14 0.892 -.1462493 .1280424
    exclude | -.0111437 .1759313 -0.06 0.950 -.3750852 .3527978
    bullied | .3265464 .1645857 1.98 0.059 -.0139251 .6670179
    agemum | -.4737876 .1992758 -2.38 0.026 -.8860209 -.0615543
    agedad | .1437429 .1429087 1.01 0.325 -.1518863 .4393722
    ethnicYP2 | .5870566 .5233486 1.12 0.274 -.4955725 1.669686
    ethnicYP3 | .4698338 .616951 0.76 0.454 -.8064267 1.746094
    ethnicYP4 | .4136424 .3469893 1.19 0.245 -.3041596 1.131444
    ethnicmum2 | -.1475068 .4621708 -0.32 0.752 -1.10358 .8085662
    ethnicmum3 | -.0984527 .4056151 -0.24 0.810 -.9375314 .7406259
    ethnicmum4 | 0 (omitted)
    ethnicdad2 | 0 (omitted)
    ethnicdad3 | -.1945947 .4473433 -0.44 0.668 -1.119995 .7308054
    ethnicdad4 | 0 (omitted)
    employmum2 | .1417959 .2261403 0.63 0.537 -.326011 .6096028
    employmum3 | 0 (omitted)
    employdad2 | 0 (omitted)
    employdad3 | 0 (omitted)
    HHincome2 | .2473541 .1405987 1.76 0.092 -.0434964 .5382047
    HHincome3 | -.1716411 .2262386 -0.76 0.456 -.6396513 .2963692
    qualimum2 | -.0206404 .1629039 -0.13 0.900 -.3576328 .316352
    qualimum3 | 0 (omitted)
    qualimum4 | -.2244219 .2178899 -1.03 0.314 -.6751615 .2263178
    qualidad2 | -.0756771 .1478103 -0.51 0.614 -.3814459 .2300918
    qualidad3 | .6226908 .4127443 1.51 0.145 -.2311357 1.476517
    qualidad4 | -.3703418 .2299829 -1.61 0.121 -.8460977 .1054141
    playtruant | -.3474518 .1854381 -1.87 0.074 -.7310596 .0361561
    _cons | 1.093653 .2259099 4.84 0.000 .6263227 1.560983
    ------------------------------------------------------------------------------


    Instrumental variables (2SLS) regression Number of obs = 44
    Wald chi2(20) = 29.87
    Prob > chi2 = 0.0720
    R-squared = 0.2879
    Root MSE = .35364

    ------------------------------------------------------------------------------
    applyforUni | Coef. Std. Err. z P>|z| [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    gotgoodmarks | 1.538012 .6181795 2.49 0.013 .3264025 2.749622
    sibs | -.1060722 .07562 -1.40 0.161 -.2542847 .0421404
    exclude | .1143196 .2039809 0.56 0.575 -.2854755 .5141148
    bullied | -.3708249 .2625466 -1.41 0.158 -.8854068 .1437569
    agemum | .6957723 .391867 1.78 0.076 -.072273 1.463818
    agedad | -.4051377 .2011253 -2.01 0.044 -.7993361 -.0109393
    ethnicYP2 | -.1164035 .7327689 -0.16 0.874 -1.552604 1.319797
    ethnicYP3 | .3019239 .7589562 0.40 0.691 -1.185603 1.789451
    ethnicYP4 | .1740415 .491461 0.35 0.723 -.7892045 1.137287
    ethnicmum2 | .100364 .5462727 0.18 0.854 -.9703107 1.171039
    ethnicmum3 | -.0604635 .4733872 -0.13 0.898 -.9882853 .8673583
    ethnicmum4 | 0 (omitted)
    ethnicdad2 | 0 (omitted)
    ethnicdad3 | .0785538 .526641 0.15 0.881 -.9536435 1.110751
    ethnicdad4 | 0 (omitted)
    employmum2 | .1731726 .2583681 0.67 0.503 -.3332195 .6795647
    employmum3 | 0 (omitted)
    employdad2 | 0 (omitted)
    employdad3 | 0 (omitted)
    HHincome2 | -.2208077 .2443773 -0.90 0.366 -.6997785 .258163
    HHincome3 | .8420891 .2790908 3.02 0.003 .2950813 1.389097
    qualimum2 | -.2004021 .1890051 -1.06 0.289 -.5708453 .170041
    qualimum3 | 0 (omitted)
    qualimum4 | -.1261699 .2872582 -0.44 0.661 -.6891856 .4368459
    qualidad2 | .3184029 .1832555 1.74 0.082 -.0407713 .677577
    qualidad3 | .3804777 .451774 0.84 0.400 -.5049831 1.265938
    qualidad4 | .6419637 .3791697 1.69 0.090 -.1011952 1.385123
    _cons | -.796578 .6927067 -1.15 0.250 -2.154258 .5611023
    ------------------------------------------------------------------------------
    Instrumented: gotgoodmarks
    Instruments: sibs exclude bullied agemum agedad ethnicYP2 ethnicYP3
    ethnicYP4 ethnicmum2 ethnicmum3 ethnicdad3 employmum2
    HHincome2 HHincome3 qualimum2 qualimum4 qualidad2 qualidad3
    qualidad4 playtruant





    Ps. Sex= 1 if they are male and 0 otherwise





    Another question is that when I run 2sls regression, I separate the regression into boys and girls. the coefficient of boys who have got good is 1.538. How can I interpret it as dummy variable. from now, I can only interpret that 'boys who got good marks during in high school will go to a college greater than who did not get good marks around 2 units'. Is it ok to say like this?
    Last edited by Yossinee Varakjunkiat; 06 Aug 2018, 04:41.

  • #2
    You'll increase your chances of a helpful answer by following the FAQ on asking questions - provide Stata code in code delimiters, readable Stata outpu (fixed spacing fonts help), and sample data using dataex. Also, cut your posting down the the minimum needed to demonstrate the problem.

    You should look at factor variable notation in the documentation - you don't need to create these dummies manually.

    As for your problems, you have too many parameters for too few observations. While there is no hard and fast rule, two observations per parameter is way too low. 5 or 10 observations per parameter are rules of thumb I've heard. This lack of observations then feeds into colinearity. Do you even have an Asian in that year?


    Comment


    • #3
      Note also that with a dummy dv, logit or probit is more appropriate than regression.

      Comment

      Working...
      X