regression possibly showing wrong coefficient

Marcel Meyer

Join Date: Apr 2017
Posts: 7

regression possibly showing wrong coefficient

24 Apr 2017, 09:36

Hello,
Once again, I unfortunately have another question which is about the coefficients of my regression.
I have to create a variable for incomeuncertainty, which is the variance or standard deviation of income.
In order to find out, I created several classes for people regarding education and age.
Now my problem is that I don't know what I did wrong.
I created the variable incomeuncertainty by following command:

Code:



. generate incomeuncertainty = 15194 if high_edu_mid_age == 1
. replace incomeuncertainty = 13346 if low_edu_low_age == 1
. replace incomeuncertainty = 15622 if low_edu_mid_age == 1
. replace incomeuncertainty = 16881 if low_edu_high_age == 1
. replace incomeuncertainty = 12870 if high_edu_low_age == 1
. replace incomeuncertainty = 15733 if high_edu_high_age == 1
. replace incomeuncertainty = 16280 if highest_edu_low_age == 1
. replace incomeuncertainty = 18403 if highest_edu_mid_age == 1
. replace incomeuncertainty = 17898 if highest_edu_high_age == 1
. label define incomeuncertainty 15194 "high_edu_mid_age" 15622 "low_edu_mid_age" 16881 "low_edu_high_age" 12870 "high_edu_low_age" 13346 "low_edu_low_age" 15733 "high_edu_high_age" 16280 "highest_edu_low_age" 18403 "highest_edu_mid_age" 17898 "highest_edu_high_age"
. label val incomeuncertainty incomeuncertainty incomeuncertainty incomeuncertainty incomeuncertainty incomeuncertainty incomeuncertainty incomeuncertainty incomeuncertainty

The numbers(e.g. 13346) is the standard deviation of the income.

Now when I am trying to make a regression it shows me following:

Code:

. regress SWB incomeuncertainty


      Source |       SS       df       MS              Number of obs =   21000
-------------+------------------------------           F(  1, 20998) =   20.94
       Model |  58.5070426     1  58.5070426           Prob > F      =  0.0000
    Residual |  58679.0632 20998  2.79450725           R-squared     =  0.0010
-------------+------------------------------           Adj R-squared =  0.0009
       Total |  58737.5702 20999  2.79716035           Root MSE      =  1.6717

-----------------------------------------------------------------------------------
                   SWB |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
------------------+----------------------------------------------------------------
incomeuncertainty |   .0000413   9.02e-06     4.58   0.000     .0000236     .000059
            _cons |   6.248592   .1459951    42.80   0.000      5.96243    6.534753
-----------------------------------------------------------------------------------

Another thing I tried is to try out following command including the output:

Code:

. regress SWB i.incomeuncertainty

-------------+------------------------------           F(  8, 20991) =   53.90
       Model |  1182.31159     8  147.788948           Prob > F      =  0.0000
    Residual |  57555.2587 20991   2.7419017           R-squared     =  0.0201
-------------+------------------------------           Adj R-squared =  0.0198
       Total |  58737.5702 20999  2.79716035           Root MSE      =  1.6559

---------------------------------------------------------------------------------------
                  SWB |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
----------------------+----------------------------------------------------------------
    incomeuncertainty |
     low_edu_low_age  |   .0032112   .1044443     0.03   0.975    -.2015077    .2079301
    high_edu_mid_age  |  -.0923238   .0714927    -1.29   0.197     -.232455    .0478074
     low_edu_mid_age  |  -.0760128   .0730286    -1.04   0.298    -.2191546    .0671289
   high_edu_high_age  |  -.4121866   .0715589    -5.76   0.000    -.5524477   -.2719256
 highest_edu_low_age  |   .3883725   .1320447     2.94   0.003     .1295548    .6471903
    low_edu_high_age  |  -.4187396   .0728824    -5.75   0.000    -.5615946   -.2758845
highest_edu_high_age  |  -.1173699   .0764315    -1.54   0.125    -.2671815    .0324417
 highest_edu_mid_age  |   .3472622   .0762924     4.55   0.000     .1977233    .4968011
                      |
                _cons |   7.073892   .0670992   105.42   0.000     6.942372    7.205411
---------------------------------------------------------------------------------------

My question now:
If I take for example the first regression, the my subjective well-being would increase! by 13346*.0000413 = 0,5511 points (on a scale from 0-10)
Which is strange, because normally incomeuncertainty doesn't affect the subjective well-being positively as I thought.
Also the Coefficients from the lower regression are almost all negative.
So what am I doing wrong ?. Why do I get such results ?.
It would be an enormous help, because I just don't know what I'm missing.
Thank you a lot in advance.

Best regards,
Marcel

Tags: None

Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#2

24 Apr 2017, 09:56

Apparently, you created a categorical variable with 8 levels. I cannot help in terms of the appropriateness of creating such a variable in this model. However, in this case, you are expected to use factor notation, as in the second example.

Best regards,

Marcos
1 like
Comment
Marcel Meyer

Join Date: Apr 2017

Posts: 7
#3

25 Apr 2017, 04:11

Hi, Marcos,
first thank you very much for your reply.
Is there any way to make the variable non-categorical, so that it depends on the variance of the income ?.

Thank you a lot in advance

Kind regards,
Marcel
Comment
Phil Bromiley

Join Date: Apr 2014

Posts: 4348
#4

25 Apr 2017, 10:48

If you have age and education categories, you can do your standard deviations by

bysort age education: egen stdincome=std(income)

This saves you coding and potential errors.
If you have only one observation per person, I can see why you'd need to define categories to calculate the standard deviations. If you have multiple observations per person, you might be able to do the std by person.

While the first regression is using income uncertainty as the explanatory variable, the second is not doing that at all. Because the second has single numbers for each category, the i. formulation means Stata is treating this as a set of dummy variables. That is, it is running dummies for each age/education group. It just happens to be creating these from a variable you've called incomeuncertainty.

By the way, I'd tend to use standard deviations - variances often have a squared relation to substantive variables that create problems.
Comment

Announcement

regression possibly showing wrong coefficient

Comment

Comment

Comment