question about reg a b c i.year, robust

Teddy Gunawan

Join Date: Dec 2018

Posts: 2
#1

question about reg a b c i.year, robust

05 Dec 2018, 20:55

i have a question about this command reg a b c i.year, robust.
what is the diffrence between using normal command reg a b c and reg a b c i.year, robust ?
forgive me if this is a stupid question lol, i've just started to use stata and face many difficulties
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30083
#2

05 Dec 2018, 21:48

There are two differences. -reg a b c i.year, robust- contains the additional set of year indicator variables that is not incuded in -reg a b c-, so it contains adjustment for year effects on a that are not included in -reg a b c-. Then there is the matter of the use of the -robust- option. That causes Stata to calculate standard errors using the Huber-White sandwich estimator which is robust to violations of homoscedasticity.
Comment
Teddy Gunawan

Join Date: Dec 2018

Posts: 2
#3

06 Dec 2018, 20:37

Thank you for the answer Mr Clyde it's really help me, but i'd like to ask one more question.

this is my result for the calculation :

Linear regression Number of obs = 165
F( 8, 156) = 10.47
Prob > F = 0.0000
R-squared = 0.3667
Root MSE = 1.7748

------------------------------------------------------------------------------
| Robust
rktp | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lnkukm | .8444457 .1041672 8.11 0.000 .6386855 1.050206
lnpdrb | -1.119184 .1500822 -7.46 0.000 -1.415639 -.8227281
sbriil | -.1073237 .2467111 -0.44 0.664 -.594649 .3800017
tpt | -.1463694 .0815211 -1.80 0.075 -.307397 .0146581
|
Periode |
2012 | .2688198 .4524748 0.59 0.553 -.6249481 1.162588
2013 | .772571 .5743621 1.35 0.181 -.3619593 1.907101
2014 | .8103533 .4502405 1.80 0.074 -.0790011 1.699708
2015 | 1.298018 .4921571 2.64 0.009 .3258662 2.27017
|
_cons | 21.12145 2.428598 8.70 0.000 16.32427 25.91862
------------------------------------------------------------------------------

my periods is consist of 5 year periods which is 2011 to 2015, but why is the result only showing from 2012 to 2015 ?
pardon me if this is a stupid question haha, iam really new to this statistic world.
Comment

Marcos Almeida

Join Date: Apr 2014
Posts: 4047

06 Dec 2018, 20:57

In this case 2011 is the reference level. You can change it. Please take a look at the manual.

This is a toy example

Code:

sysuse auto
(1978 Automobile Data)

. tab rep78

Repair |
Record 1978 | Freq. Percent Cum.
------------+-----------------------------------
1 | 2 2.90 2.90
2 | 8 11.59 14.49
3 | 30 43.48 57.97
4 | 18 26.09 84.06
5 | 11 15.94 100.00
------------+-----------------------------------
Total | 69 100.00

. regress mpg i.rep78

Source | SS df MS Number of obs = 69
-------------+---------------------------------- F(4, 64) = 4.91
Model | 549.415777 4 137.353944 Prob > F = 0.0016
Residual | 1790.78712 64 27.9810488 R-squared = 0.2348
-------------+---------------------------------- Adj R-squared = 0.1869
Total | 2340.2029 68 34.4147485 Root MSE = 5.2897

------------------------------------------------------------------------------
mpg | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
rep78 |
2 | -1.875 4.181884 -0.45 0.655 -10.22927 6.479274
3 | -1.566667 3.863059 -0.41 0.686 -9.284014 6.150681
4 | .6666667 3.942718 0.17 0.866 -7.209818 8.543152
5 | 6.363636 4.066234 1.56 0.123 -1.759599 14.48687
|
_cons | 21 3.740391 5.61 0.000 13.52771 28.47229
------------------------------------------------------------------------------

. regress mpg ib3.rep78

Source | SS df MS Number of obs = 69
-------------+---------------------------------- F(4, 64) = 4.91
Model | 549.415777 4 137.353944 Prob > F = 0.0016
Residual | 1790.78712 64 27.9810488 R-squared = 0.2348
-------------+---------------------------------- Adj R-squared = 0.1869
Total | 2340.2029 68 34.4147485 Root MSE = 5.2897

------------------------------------------------------------------------------
mpg | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
rep78 |
1 | 1.566667 3.863059 0.41 0.686 -6.150681 9.284014
2 | -.3083333 2.104836 -0.15 0.884 -4.513226 3.896559
4 | 2.233333 1.577087 1.42 0.162 -.9172607 5.383927
5 | 7.930303 1.86452 4.25 0.000 4.205497 11.65511
|
_cons | 19.43333 .9657648 20.12 0.000 17.504 21.36267

. regress mpg i.rep78, base

      Source |       SS           df       MS      Number of obs   =        69
-------------+----------------------------------   F(4, 64)        =      4.91
       Model |  549.415777         4  137.353944   Prob > F        =    0.0016
    Residual |  1790.78712        64  27.9810488   R-squared       =    0.2348
-------------+----------------------------------   Adj R-squared   =    0.1869
       Total |   2340.2029        68  34.4147485   Root MSE        =    5.2897

------------------------------------------------------------------------------
         mpg |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       rep78 |
          1  |          0  (base)
          2  |     -1.875   4.181884    -0.45   0.655    -10.22927    6.479274
          3  |  -1.566667   3.863059    -0.41   0.686    -9.284014    6.150681
          4  |   .6666667   3.942718     0.17   0.866    -7.209818    8.543152
          5  |   6.363636   4.066234     1.56   0.123    -1.759599    14.48687
             |
       _cons |         21   3.740391     5.61   0.000     13.52771    28.47229
------------------------------------------------------------------------------

Last edited by Marcos Almeida; 06 Dec 2018, 21:05.

Best regards,

Marcos

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30083
#5

06 Dec 2018, 22:13

Marcos is 100% correct. Just amplifying what he said, in case you are not familiar with reference categories, in regression, when a categorical variable is used, it is represented by separate 0/1 variables for each level (value the variable takes on in the data) except 1. So, for example, sex (male/female) would be represented by either a variable female = 1 & male = 0 or a variable male = 1 & female = 0, but not both. A variable with three values, say orange, apple or banana could be represented by any two of orange = 1 & apple/banana = 0, apple = 1 & orange/banana = 0, or banana = 1 & orange/apple = 0. As Marcos points out, you can control which level gets omitted (and is called the reference category). If you don't tell Stata which one to use, Stata will pick one, usually the lowest numbered one.

This is a basic aspect of regression with categorical variables. If that is new to you (and we were all beginners once) you should pick up an elementary textbook on regression (or general statistics with a chapter on regression) and read about the use of "dummy" or "indicator" variables to represent categorical variables.
Comment

Announcement

question about reg a b c i.year, robust

Comment

Comment

Comment

Comment