Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • OLS showing unexpected sign

    Hi

    I am running an OLS regression. I am using the dependent variable BMI and the independent variable exercise. Exercise is measured as the number of times a week one exercises for 30 mins or more. I keep getting a positive and insignificant coefficient for exercise even after including more explanatory variables. I have tried to use the log of BMI but the result remains the same. I have also tried removing outliers in the BMI data but i still get a positive coefficient. Please help =(

    Kind Regards
    Nonsi Nkomo

  • #2
    Show us a scatter plot

    Code:
    scatter BMI exercise

    Comment


    • #3
      Click image for larger version

Name:	SCAT.png
Views:	1
Size:	41.6 KB
ID:	1449914


      This is the graph before removing any outliers or taking the log of BMI

      Comment


      • #4
        Nonsi:
        have you checked for omitted variable bias (-estat ovtest-)?
        Sharing what you typed and what Stats gave you back (via CODE delimiters, please) would help as well.
        Kind regards,
        Carlo
        (Stata 18.0 SE)

        Comment


        • #5
          Hi Carlo

          Here is the code I used.

          Code:
          reg DBMI8 W8EXERCISE
          
                Source |       SS           df       MS      Number of obs   =     7,069
          -------------+----------------------------------   F(1, 7067)      =      0.55
                 Model |  17.9850806         1  17.9850806   Prob > F        =    0.4597
              Residual |  232536.932     7,067  32.9046175   R-squared       =    0.0001
          -------------+----------------------------------   Adj R-squared   =   -0.0001
                 Total |  232554.917     7,068  32.9025066   Root MSE        =    5.7363
          
          ------------------------------------------------------------------------------
                 DBMI8 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
          -------------+----------------------------------------------------------------
            W8EXERCISE |   .0237712   .0321532     0.74   0.460    -.0392586    .0868011
                 _cons |   25.18001   .1181194   213.17   0.000     24.94846    25.41156
          ------------------------------------------------------------------------------
          
          .
          Kind Regards
          Nonsi Nkomo
          Last edited by nonsi nkomo; 20 Jun 2018, 16:44.

          Comment


          • #6
            Hi Carlo

            I have now performed the omitted variable bias test and my results are as follows.

            Code:
            estat ovtest
            
            Ramsey RESET test using powers of the fitted values of DBMI8
                   Ho:  model has no omitted variables
                            F(3, 7064) =      0.22
                              Prob > F =      0.8807
            Kind Regards
            Nonsi Nkomo

            Comment


            • #7
              two things that may help:

              1. ratio variables should be avoided; you might want to see the citations in #2 and #3 in https://www.statalist.org/forums/for...y-zit-on-zit-1

              2. on the "wrong" sign, you might want to see the various discussions, including citations, in https://www.statalist.org/forums/for...ols-regression

              Comment


              • #8
                I don't understand why you're dumbfounded. You have horribly noisy and improbable data, some not insubstantial proportion of which appears to arise from pulling the interviewer's leg. (I mean, c'mon, someone with a body mass index of 110 exercising four days a week for 30+ minutes each time? That person can scarcely move at all.)

                Mixed in with all of the joshing you might have a subpopulation of athletic types whose muscle mass is reflected in relatively higher BMIs who exercise regularly (hence I think the question above about omitted variables), pulling up a plot containing an otherwise overweight sedentary crowd for a net zero slope.

                And, amid all of the measurement error, the zero slope is just what you got—it couldn't eke out an estimate (0.02 ± 0.03) coming anywhere near a statistically significant difference from zero despite a sample of over seven thousand. With a random subsample, maybe jittering and smaller symbols so that there isn't so much overlap, you could undoubtedly see the same thing in the scatter plot.

                Comment


                • #9
                  I'd want something more like a box plot for each predictor value. My guess is that each conditional distribution is highly skewed.

                  Comment


                  • #10
                    Hi Joseph

                    Thank you for the help. I constructed the scatter plot before removing any outliers. Even after removing the outliers in BMI I am still getting a positive coefficient for exercise. I kept 12.75<BMI<36.4. Also I am quite the amateur when it comes to STATA and modelling so I am confused by a lot of the jargon.

                    Kind Regards
                    Nonsi Nkomo
                    Last edited by nonsi nkomo; 21 Jun 2018, 03:33.

                    Comment


                    • #11
                      Nonsi:
                      the situatiion you're experiencing seems similar to the following one:
                      Code:
                      . regress price rep78
                      
                            Source |       SS           df       MS      Number of obs   =        69
                      -------------+----------------------------------   F(1, 67)        =      0.00
                             Model |  24770.7652         1  24770.7652   Prob > F        =    0.9574
                          Residual |   576772188        67  8608540.12   R-squared       =    0.0000
                      -------------+----------------------------------   Adj R-squared   =   -0.0149
                             Total |   576796959        68  8482308.22   Root MSE        =      2934
                      
                      ------------------------------------------------------------------------------
                             price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                      -------------+----------------------------------------------------------------
                             rep78 |   19.28012   359.4221     0.05   0.957    -698.1295    736.6897
                             _cons |   6080.379    1274.06     4.77   0.000     3537.345    8623.413
                      ------------------------------------------------------------------------------
                      
                      . estat ovtest
                      
                      Ramsey RESET test using powers of the fitted values of price
                             Ho:  model has no omitted variables
                                        F(3, 64) =      0.31
                                        Prob > F =      0.8160
                      
                      . predict fitted, xb
                      (5 missing values generated)
                      
                      . g sq_fitted=fitted^2
                      (5 missing values generated)
                      
                      . regress price rep78 sq_fitted
                      
                            Source |       SS           df       MS      Number of obs   =        69
                      -------------+----------------------------------   F(2, 66)        =      0.38
                             Model |  6642599.02         2  3321299.51   Prob > F        =    0.6823
                          Residual |   570154360        66  8638702.42   R-squared       =    0.0115
                      -------------+----------------------------------   Adj R-squared   =   -0.0184
                             Total |   576796959        68  8482308.22   Root MSE        =    2939.2
                      
                      ------------------------------------------------------------------------------
                             price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                      -------------+----------------------------------------------------------------
                             rep78 |   158316.9   180859.7     0.88   0.385    -202781.1    519414.9
                         sq_fitted |  -.6679995   .7632075    -0.88   0.385    -2.191792    .8557935
                             _cons |   2.47e+07   2.82e+07     0.88   0.384    -3.16e+07    8.10e+07
                      ------------------------------------------------------------------------------
                      Despite a non-significant -estat ovtest- (as also proved by the augmented regression with -sq_fitted, that turns out to be non-significant), the R-sq of the original OLS is totally negligible.
                      I would say that, despite -estat ovtest- outcome passes the muster, my model specification is highly unsatisfactory.
                      Kind regards,
                      Carlo
                      (Stata 18.0 SE)

                      Comment


                      • #12
                        Thank you Carlo for the help. Do you think it has something to do with my independent variable, exercise?

                        Comment


                        • #13
                          Nonsi:
                          it may be so; unfortunately, my limited knowledge of your research field does not allow me to reply more positively.
                          Kind regards,
                          Carlo
                          (Stata 18.0 SE)

                          Comment


                          • #14
                            Thank you Carlo for the help.

                            Comment


                            • #15
                              Hi Nick

                              Here is a box plot of the predictor variable exercise.

                              Click image for larger version

Name:	exercise box plot .png
Views:	1
Size:	28.0 KB
ID:	1450428


                              Kind Regards
                              Nonsi Nkomo

                              Comment

                              Working...
                              X