Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • regression possibly showing wrong coefficient

    Hello,
    Once again, I unfortunately have another question which is about the coefficients of my regression.
    I have to create a variable for incomeuncertainty, which is the variance or standard deviation of income.
    In order to find out, I created several classes for people regarding education and age.
    Now my problem is that I don't know what I did wrong.
    I created the variable incomeuncertainty by following command:
    Code:
    
    
    . generate incomeuncertainty = 15194 if high_edu_mid_age == 1
    . replace incomeuncertainty = 13346 if low_edu_low_age == 1
    . replace incomeuncertainty = 15622 if low_edu_mid_age == 1
    . replace incomeuncertainty = 16881 if low_edu_high_age == 1
    . replace incomeuncertainty = 12870 if high_edu_low_age == 1
    . replace incomeuncertainty = 15733 if high_edu_high_age == 1
    . replace incomeuncertainty = 16280 if highest_edu_low_age == 1
    . replace incomeuncertainty = 18403 if highest_edu_mid_age == 1
    . replace incomeuncertainty = 17898 if highest_edu_high_age == 1
    . label define incomeuncertainty 15194 "high_edu_mid_age" 15622 "low_edu_mid_age" 16881 "low_edu_high_age" 12870 "high_edu_low_age" 13346 "low_edu_low_age" 15733 "high_edu_high_age" 16280 "highest_edu_low_age" 18403 "highest_edu_mid_age" 17898 "highest_edu_high_age"
    . label val incomeuncertainty incomeuncertainty incomeuncertainty incomeuncertainty incomeuncertainty incomeuncertainty incomeuncertainty incomeuncertainty incomeuncertainty
    The numbers(e.g. 13346) is the standard deviation of the income.

    Now when I am trying to make a regression it shows me following:
    Code:
    . regress SWB incomeuncertainty
    
    
          Source |       SS       df       MS              Number of obs =   21000
    -------------+------------------------------           F(  1, 20998) =   20.94
           Model |  58.5070426     1  58.5070426           Prob > F      =  0.0000
        Residual |  58679.0632 20998  2.79450725           R-squared     =  0.0010
    -------------+------------------------------           Adj R-squared =  0.0009
           Total |  58737.5702 20999  2.79716035           Root MSE      =  1.6717
    
    -----------------------------------------------------------------------------------
                       SWB |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    ------------------+----------------------------------------------------------------
    incomeuncertainty |   .0000413   9.02e-06     4.58   0.000     .0000236     .000059
                _cons |   6.248592   .1459951    42.80   0.000      5.96243    6.534753
    -----------------------------------------------------------------------------------
    Another thing I tried is to try out following command including the output:
    Code:
    . regress SWB i.incomeuncertainty
    
    -------------+------------------------------           F(  8, 20991) =   53.90
           Model |  1182.31159     8  147.788948           Prob > F      =  0.0000
        Residual |  57555.2587 20991   2.7419017           R-squared     =  0.0201
    -------------+------------------------------           Adj R-squared =  0.0198
           Total |  58737.5702 20999  2.79716035           Root MSE      =  1.6559
    
    ---------------------------------------------------------------------------------------
                      SWB |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    ----------------------+----------------------------------------------------------------
        incomeuncertainty |
         low_edu_low_age  |   .0032112   .1044443     0.03   0.975    -.2015077    .2079301
        high_edu_mid_age  |  -.0923238   .0714927    -1.29   0.197     -.232455    .0478074
         low_edu_mid_age  |  -.0760128   .0730286    -1.04   0.298    -.2191546    .0671289
       high_edu_high_age  |  -.4121866   .0715589    -5.76   0.000    -.5524477   -.2719256
     highest_edu_low_age  |   .3883725   .1320447     2.94   0.003     .1295548    .6471903
        low_edu_high_age  |  -.4187396   .0728824    -5.75   0.000    -.5615946   -.2758845
    highest_edu_high_age  |  -.1173699   .0764315    -1.54   0.125    -.2671815    .0324417
     highest_edu_mid_age  |   .3472622   .0762924     4.55   0.000     .1977233    .4968011
                          |
                    _cons |   7.073892   .0670992   105.42   0.000     6.942372    7.205411
    ---------------------------------------------------------------------------------------
    My question now:
    If I take for example the first regression, the my subjective well-being would increase! by 13346*.0000413 = 0,5511 points (on a scale from 0-10)
    Which is strange, because normally incomeuncertainty doesn't affect the subjective well-being positively as I thought.
    Also the Coefficients from the lower regression are almost all negative.
    So what am I doing wrong ?. Why do I get such results ?.
    It would be an enormous help, because I just don't know what I'm missing.
    Thank you a lot in advance.

    Best regards,
    Marcel

  • #2
    Apparently, you created a categorical variable with 8 levels. I cannot help in terms of the appropriateness of creating such a variable in this model. However, in this case, you are expected to use factor notation, as in the second example.
    Best regards,

    Marcos

    Comment


    • #3
      Hi, Marcos,
      first thank you very much for your reply.
      Is there any way to make the variable non-categorical, so that it depends on the variance of the income ?.

      Thank you a lot in advance

      Kind regards,
      Marcel

      Comment


      • #4
        If you have age and education categories, you can do your standard deviations by

        bysort age education: egen stdincome=std(income)

        This saves you coding and potential errors.
        If you have only one observation per person, I can see why you'd need to define categories to calculate the standard deviations. If you have multiple observations per person, you might be able to do the std by person.

        While the first regression is using income uncertainty as the explanatory variable, the second is not doing that at all. Because the second has single numbers for each category, the i. formulation means Stata is treating this as a set of dummy variables. That is, it is running dummies for each age/education group. It just happens to be creating these from a variable you've called incomeuncertainty.

        By the way, I'd tend to use standard deviations - variances often have a squared relation to substantive variables that create problems.

        Comment

        Working...
        X