Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to deal with bias when using OLS

    Good morning Stata enthusiasts,
    I'm working with a sample of the US Population survey on a single timeframe, and I have been tasked with creating an OLS regression with wage as a dependent variable.
    I've come up with the following regression:

    Code:
    reg wage age female i.wbhaom citizen married ch02 ch35 unmem multjob rural i(8/16).educ92 ind_m03 occ_m03 uhourse, robust
    Where ch02 and ch35 mean having a child that is 0 to 2 years old and 3 to 5 respectively
    For which I get the following output
    Code:
    Linear regression                               Number of obs     =     51,188
                                                    F(25, 51162)      =     749.16
                                                    Prob > F          =     0.0000
                                                    R-squared         =     0.2255
                                                    Root MSE          =     17.501
    
    -----------------------------------------------------------------------------------------------------------
                                              |               Robust
                                         wage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
    ------------------------------------------+----------------------------------------------------------------
                                          age |   .3323361   .0115434    28.79   0.000     .3097109    .3549614
                                       female |  -6.888845   .2195233   -31.38   0.000    -7.319113   -6.458577
                                              |
                                       wbhaom |
                                       Black  |  -2.922971   .2343337   -12.47   0.000    -3.382268   -2.463675
                                    Hispanic  |  -2.052701   .1827714   -11.23   0.000    -2.410935   -1.694468
                                       Asian  |   2.248498   .3033635     7.41   0.000     1.653903    2.843094
                             Native American  |  -1.118968   .5559207    -2.01   0.044    -2.208578   -.0293573
                                       Mixed  |    -1.1976   .4970338    -2.41   0.016    -2.171791   -.2234086
                                              |
                                      citizen |   1.776596   .2450709     7.25   0.000     1.296254    2.256937
                                      married |   1.101605   .1617439     6.81   0.000     .7845852    1.418625
                                         ch02 |   .7732423   .1877503     4.12   0.000     .4052498    1.141235
                                         ch35 |   1.456089   .1866423     7.80   0.000     1.090268     1.82191
                                        unmem |   1.626295    .229503     7.09   0.000     1.176467    2.076123
                                      multjob |   -.958199    .657534    -1.46   0.145    -2.246972    .3305744
                                        rural |  -2.588769   .2334876   -11.09   0.000    -3.046407   -2.131131
                                              |
                                       educ92 |
                            HS graduate, GED  |   3.884304   .2171748    17.89   0.000     3.458639    4.309969
                  Some college but no degree  |   5.549034   .2607023    21.28   0.000     5.038055    6.060013
    Associate degree-occupational/vocational  |   6.848828   .3468357    19.75   0.000     6.169026    7.528629
           Associate degree-academic program  |   6.359582   .2955889    21.51   0.000     5.780225     6.93894
                           Bachelor's degree  |   14.35967   .3517009    40.83   0.000     13.67033      15.049
                             Master's degree  |   18.80381   .4391888    42.81   0.000     17.94299    19.66462
                         Professional school  |   24.25171   .7900214    30.70   0.000     22.70326    25.80016
                                   Doctorate  |   23.34896   .6026218    38.75   0.000     22.16782    24.53011
                                              |
                                      ind_m03 |  -.6034247   .0299504   -20.15   0.000    -.6621278   -.5447217
                                      occ_m03 |  -.9560231   .0494366   -19.34   0.000    -1.052919   -.8591269
                                      uhourse |  -.0706635   .0268196    -2.63   0.008    -.1232301   -.0180969
                                        _cons |   17.70549   1.026504    17.25   0.000     15.69353    19.71745
    -----------------------------------------------------------------------------------------------------------


    As you can see the coefficient for ch02 and ch35 are not what you would expect. Later I thought age might be playing a role and tried to correct for that as well.
    Code:
    . reg wage age female i.wbhaom citizen married ch02#c.age ch35#c.age unmem multjob rural i(8/16).educ92 ind_m03 occ_m03 uhourse, robust
    
    Linear regression                               Number of obs     =     51,188
                                                    F(25, 51162)      =     750.99
                                                    Prob > F          =     0.0000
                                                    R-squared         =     0.2257
                                                    Root MSE          =     17.499
    
    -----------------------------------------------------------------------------------------------------------
                                              |               Robust
                                         wage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
    ------------------------------------------+----------------------------------------------------------------
                                          age |   .3265101   .0114281    28.57   0.000     .3041109    .3489092
                                       female |  -6.858415   .2209402   -31.04   0.000     -7.29146    -6.42537
                                              |
                                       wbhaom |
                                       Black  |  -2.926252   .2343458   -12.49   0.000    -3.385572   -2.466932
                                    Hispanic  |  -2.055904   .1827633   -11.25   0.000    -2.414122   -1.697686
                                       Asian  |   2.234674   .3033155     7.37   0.000     1.640172    2.829175
                             Native American  |  -1.115237   .5559219    -2.01   0.045     -2.20485   -.0256244
                                       Mixed  |  -1.194969   .4965366    -2.41   0.016    -2.168186   -.2217523
                                              |
                                      citizen |   1.781887   .2449796     7.27   0.000     1.301724    2.262049
                                      married |   1.084751   .1617887     6.70   0.000     .7676433    1.401859
                                              |
                                   ch02#c.age |
                                           1  |   .0256436   .0058975     4.35   0.000     .0140844    .0372027
                                              |
                                   ch35#c.age |
                                           1  |   .0420193   .0055704     7.54   0.000     .0311013    .0529372
                                              |
                                        unmem |   1.625639    .229388     7.09   0.000     1.176036    2.075242
                                      multjob |  -.9587118   .6575472    -1.46   0.145    -2.247511    .3300875
                                        rural |  -2.577298   .2332964   -11.05   0.000    -3.034561   -2.120034
                                              |
                                       educ92 |
                            HS graduate, GED  |   3.889077   .2172296    17.90   0.000     3.463305    4.314849
                  Some college but no degree  |   5.539726   .2607647    21.24   0.000     5.028624    6.050827
    Associate degree-occupational/vocational  |   6.839987   .3468462    19.72   0.000     6.160165    7.519809
           Associate degree-academic program  |   6.344039   .2957595    21.45   0.000     5.764347    6.923731
                           Bachelor's degree  |   14.32684   .3522382    40.67   0.000     13.63645    15.01723
                             Master's degree  |   18.75708    .440993    42.53   0.000     17.89273    19.62144
                         Professional school  |   24.19602   .7909329    30.59   0.000     22.64578    25.74625
                                   Doctorate  |   23.28747   .6035842    38.58   0.000     22.10444     24.4705
                                              |
                                      ind_m03 |  -.6033473   .0299595   -20.14   0.000    -.6620682   -.5446265
                                      occ_m03 |  -.9552774   .0494302   -19.33   0.000    -1.052161   -.8583937
                                      uhourse |  -.0704938    .026829    -2.63   0.009    -.1230789   -.0179087
                                        _cons |   17.91144   1.023001    17.51   0.000     15.90635    19.91653
    -----------------------------------------------------------------------------------------------------------

    Should I leave the model like this or is there a way to further account for bias.

    Thank you in advance!

  • #2
    Luis:
    welcome to this forum.
    Some comments about your post:
    1) if your data come from a survey, you should take a look at the -svy:- prefix;
    2) adding a sguare term for age -c.age##c.age- is useful to search for possible turning points;
    3) you may want to consider logging the dependent variable (regressand) and go for a log-linear regression model;
    4) I am not cear with your complaint about OLS bias.
    As we know, OLS requirements include no endogeneity (otherwise, you should go -ivregress-); no heteroskedasticity of the epsilon (see -robust- option for standard errors); no autocorrelation of the epsilon (see -vce(cluster clusterid)- option); no misspecification of the functional form of the regressand (see -linktest-).
    Kind regards,
    Carlo
    (Stata 18.0 SE)

    Comment


    • #3
      Hi Carlo, I appreciate your response, its nice to see there are people willing to help novices like me.

      With relation to your second point, could you further explain why this square term would be useful?

      As to my issue with the bias,I simply obtained some unexpected coefficients and was wondering if there might be omitted variable or other biases present that cause this

      Comment


      • #4
        Luis:
        as far as my point 2 is concerned, see the following toy example:
        Code:
        . sysuse auto.dta
        (1978 automobile data)
        
        . regress price c.mpg##c.mpg
        
              Source |       SS           df       MS      Number of obs   =        74
        -------------+----------------------------------   F(2, 71)        =     18.28
               Model |   215835615         2   107917807   Prob > F        =    0.0000
            Residual |   419229781        71  5904644.81   R-squared       =    0.3399
        -------------+----------------------------------   Adj R-squared   =    0.3213
               Total |   635065396        73  8699525.97   Root MSE        =    2429.9
        
        ------------------------------------------------------------------------------
               price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
        -------------+----------------------------------------------------------------
                 mpg |  -1265.194   289.5443    -4.37   0.000    -1842.529   -687.8593
                     |
         c.mpg#c.mpg |   21.36069   5.938885     3.60   0.001     9.518891    33.20249
                     |
               _cons |   22716.48   3366.577     6.75   0.000     16003.71    29429.24
        ------------------------------------------------------------------------------
        
        . di -(-1265.194)/(2* 21.36069)
        29.615008
        The square term is significant: therefore, there is a possible turning point.
        a) First question: does 29.61 falls within the range of -mpg-?
        Code:
        sum mpg
           Variable |        Obs        Mean    Std. dev.       Min        Max
        -------------+---------------------------------------------------------
                 mpg |         74     21.2973    5.785503         12         41
        
        .
        Yes, it does (otherwise, despite the results of the analytical calculation, there's no evidence of a turning point within the range of -mpg- and our investigation would stop here).

        2) Is it a minimum or a maximum?
        Let's investigate it via -margin- and -marginsplot-:
        Code:
        margins, at(mpg = (12.0(0.5)41.0))
        marginsplot
        I won't spoil the final of the movie .

        As far as your last question is concerned, oftentimes coefficients are strange. It's up to the researcher to investigate if the regression is correctly specified via the tests that I mentioned in my previous reply.
        Kind regards,
        Carlo
        (Stata 18.0 SE)

        Comment


        • #5
          Thank you Carlo, now I understand it and was able to implement your advice in my project.

          I really appreciate your help!
          Last edited by Luis Garcia; 27 Mar 2024, 19:19.

          Comment


          • #6
            I have one more question.

            Now that I have alredy tested and confirmed that age has a turning point and follows a quadratic distribution, should I still keep the linear term in the regression (c.age##c.age) or I am better off only inlcuding the square term (c.age#c.age)?

            Comment


            • #7
              Luis:
              you should keep the previous -c.age##c.age- notation.
              Otherwise, how could you indentify the turning point (-b/2a) in your outcome table?
              Kind regards,
              Carlo
              (Stata 18.0 SE)

              Comment

              Working...
              X