coef 0 & omitted std err for one of 9 dummy variable

Solal Martin-Camus

Join Date: May 2020

Posts: 8
#1

coef 0 & omitted std err for one of 9 dummy variable

10 May 2020, 13:55

Hi all, I'm fairly new to stata and trying to run a simple regression that includes 9 dummy variables on the right for type of industry. One of them shows a coef of 0 and std err as (omitted), and nothing for the t and confidence intervals values. The other 8 are fine and give coefs that make sense. I do not get any error message or additional output once I run the regression, and all other values seem good to me.

The dummy variable is in the last four lines of my do file below, seeing as it is short I've included it in its entirety.

Could you please help me understand why stata is doing this and how I may go about resolving this issue?

keep if AGE>16
drop if ATTEND==1 //dropping people still attending course
keep if FTPT==1 //keeping only full-time workers
drop if HIQUL15D==7 //don't know qualification
drop if HIQUL15D==-8 //no answer
drop if HIQUL15D==-9 //does not apply

gen HQUAL=0 //reverse scale of HIQUL15D
replace HQUAL=6 if HIQUL15D==1
replace HQUAL=5 if HIQUL15D==2
replace HQUAL=4 if HIQUL15D==3
replace HQUAL=3 if HIQUL15D==4
replace HQUAL=2 if HIQUL15D==5
replace HQUAL=1 if HIQUL15D==6

gen LOGHOURPAY=log(HOURPAY)
drop if LOGHOURPAY==.

drop if ETH11EW==-8 //no answer
drop if ETH11EW==-9 //does not apply

gen ETHMIN=0 //white or else variable
replace ETHMIN=1 if ETH11EW!=1 //Else
replace ETHMIN=2 if ETH11EW==1 //White

drop if INDE07M==-8 //no answer

//Dummy variables for Industry
tabulate INDE07M, g(INDTYPE)

reg LOGHOURPAY HQUAL AGE SEX ETHMIN INDTYPE1 INDTYPE2 INDTYPE3 INDTYPE4 INDTYPE5 INDTYPE6 INDTYPE7 INDTYPE8 INDTYPE9
Tags: None

Scott Merryman

Join Date: Mar 2014
Posts: 896

10 May 2020, 15:24

Look up "dummy variable trap" Was there a note at the top indicating that the variable was omitted because of collinearity. Also, Stata can automatically create such variables see: -help fvvarlist-

Compare:

Code:

. sysuse auto,clear
(1978 Automobile Data)

. qui tab rep, gen(D)

. reg price D*
note: D1 omitted because of collinearity

      Source |       SS           df       MS      Number of obs   =        69
-------------+----------------------------------   F(4, 64)        =      0.24
       Model |  8360542.63         4  2090135.66   Prob > F        =    0.9174
    Residual |   568436416        64     8881819   R-squared       =    0.0145
-------------+----------------------------------   Adj R-squared   =   -0.0471
       Total |   576796959        68  8482308.22   Root MSE        =    2980.2

------------------------------------------------------------------------------
       price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          D1 |          0  (omitted)
          D2 |   1403.125   2356.085     0.60   0.554    -3303.696    6109.946
          D3 |   1864.733   2176.458     0.86   0.395    -2483.242    6212.708
          D4 |       1507   2221.338     0.68   0.500    -2930.633    5944.633
          D5 |     1348.5   2290.927     0.59   0.558    -3228.153    5925.153
       _cons |     4564.5   2107.347     2.17   0.034     354.5913    8774.409
------------------------------------------------------------------------------

. reg price D*, hascon

      Source |       SS           df       MS      Number of obs   =        69
-------------+----------------------------------   F(4, 64)        =      0.24
       Model |  8360542.63         4  2090135.66   Prob > F        =    0.9174
    Residual |   568436416        64     8881819   R-squared       =    0.0145
-------------+----------------------------------   Adj R-squared   =   -0.0471
       Total |   576796959        68  8482308.22   Root MSE        =    2980.2

------------------------------------------------------------------------------
       price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          D1 |     4564.5   2107.347     2.17   0.034     354.5913    8774.409
          D2 |   5967.625   1053.673     5.66   0.000     3862.671    8072.579
          D3 |   6429.233   544.1145    11.82   0.000      5342.24    7516.227
          D4 |     6071.5   702.4489     8.64   0.000     4668.197    7474.803
          D5 |       5913   898.5756     6.58   0.000     4117.889    7708.111
------------------------------------------------------------------------------

Comment

Solal Martin-Camus

Join Date: May 2020

Posts: 8
#3

10 May 2020, 15:36

Thank you for answering! I just saw that there was indeed a note saying omitted because of collinearity, not very thorough of me...Just read up on the dummy variable trap and it all makes sense now! in your example, does hascon drop the intercept so that you can keep all the dummies? Stata doesn't recognise it as a command for me, is it an add on?
Comment
Scott Merryman

Join Date: Mar 2014

Posts: 896
#4

10 May 2020, 15:47

-hascon- is an option with -regress- See -help regress-.

Yes, it drops the constant though it is not really necessary. In the first example above the coefficients are relative to the omitted category, so D5 = _b[D5] + _b[_cons] = 5913.
Comment
Solal Martin-Camus

Join Date: May 2020

Posts: 8
#5

10 May 2020, 17:37

I think im still struggling—using -hascon- returns (note: hascons false) and runs the regression with an omitted category. Do I need to specify something else in the reg command line? I see that you used D*, while I am using -i.catvar- now so I have -reg LOGHOURPAY HQUAL AGE SEX ETHMIN i.INDE07M, hascon- does that make sense? seeing as my categorical variable is type of industry I would prefer not having one of them as reference value for interpretation purposes. Thank you again.
Comment

Scott Merryman

Join Date: Mar 2014
Posts: 896

10 May 2020, 18:41

If you are using factor notation (i.varname) then Stata will automatically drop the first category. If you want direct estimates for all categories then you need to use the no base level operator: ibn.varname.

Compare:

Code:

. sysuse auto,clear
(1978 Automobile Data)

. reg price i.rep, hascons
(note: hascons false)

      Source |       SS           df       MS      Number of obs   =        69
-------------+----------------------------------   F(4, 64)        =      0.24
       Model |  8360542.63         4  2090135.66   Prob > F        =    0.9174
    Residual |   568436416        64     8881819   R-squared       =    0.0145
-------------+----------------------------------   Adj R-squared   =   -0.0471
       Total |   576796959        68  8482308.22   Root MSE        =    2980.2

------------------------------------------------------------------------------
       price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       rep78 |
          2  |   1403.125   2356.085     0.60   0.554    -3303.696    6109.946
          3  |   1864.733   2176.458     0.86   0.395    -2483.242    6212.708
          4  |       1507   2221.338     0.68   0.500    -2930.633    5944.633
          5  |     1348.5   2290.927     0.59   0.558    -3228.153    5925.153
             |
       _cons |     4564.5   2107.347     2.17   0.034     354.5913    8774.409
------------------------------------------------------------------------------

. reg price ibn.rep , hascons

      Source |       SS           df       MS      Number of obs   =        69
-------------+----------------------------------   F(4, 64)        =      0.24
       Model |  8360542.63         4  2090135.66   Prob > F        =    0.9174
    Residual |   568436416        64     8881819   R-squared       =    0.0145
-------------+----------------------------------   Adj R-squared   =   -0.0471
       Total |   576796959        68  8482308.22   Root MSE        =    2980.2

------------------------------------------------------------------------------
       price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       rep78 |
          1  |     4564.5   2107.347     2.17   0.034     354.5913    8774.409
          2  |   5967.625   1053.673     5.66   0.000     3862.671    8072.579
          3  |   6429.233   544.1145    11.82   0.000      5342.24    7516.227
          4  |     6071.5   702.4489     8.64   0.000     4668.197    7474.803
          5  |       5913   898.5756     6.58   0.000     4117.889    7708.111
------------------------------------------------------------------------------

Comment

Solal Martin-Camus

Join Date: May 2020

Posts: 8
#7

11 May 2020, 03:17

Thank you so much for your help!
Comment
huk hyung jung

Join Date: May 2020

Posts: 12
#8

28 May 2020, 11:06

Hello Scott Merryman,

I am running a Pooled data regression using XTREG and I noticed that the command "hascons" doesnt work for xtreg.

Do you know another to run the regression with no constant through XTREG ?

Thanks!!
Comment
Scott Merryman

Join Date: Mar 2014

Posts: 896
#9

29 May 2020, 03:31

No, but if you are running pooled regression then just use -regress-
Comment

Announcement