Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • coef 0 & omitted std err for one of 9 dummy variable

    Hi all, I'm fairly new to stata and trying to run a simple regression that includes 9 dummy variables on the right for type of industry. One of them shows a coef of 0 and std err as (omitted), and nothing for the t and confidence intervals values. The other 8 are fine and give coefs that make sense. I do not get any error message or additional output once I run the regression, and all other values seem good to me.

    The dummy variable is in the last four lines of my do file below, seeing as it is short I've included it in its entirety.

    Could you please help me understand why stata is doing this and how I may go about resolving this issue?

    keep if AGE>16
    drop if ATTEND==1 //dropping people still attending course
    keep if FTPT==1 //keeping only full-time workers
    drop if HIQUL15D==7 //don't know qualification
    drop if HIQUL15D==-8 //no answer
    drop if HIQUL15D==-9 //does not apply

    gen HQUAL=0 //reverse scale of HIQUL15D
    replace HQUAL=6 if HIQUL15D==1
    replace HQUAL=5 if HIQUL15D==2
    replace HQUAL=4 if HIQUL15D==3
    replace HQUAL=3 if HIQUL15D==4
    replace HQUAL=2 if HIQUL15D==5
    replace HQUAL=1 if HIQUL15D==6

    gen LOGHOURPAY=log(HOURPAY)
    drop if LOGHOURPAY==.

    drop if ETH11EW==-8 //no answer
    drop if ETH11EW==-9 //does not apply

    gen ETHMIN=0 //white or else variable
    replace ETHMIN=1 if ETH11EW!=1 //Else
    replace ETHMIN=2 if ETH11EW==1 //White

    drop if INDE07M==-8 //no answer

    //Dummy variables for Industry
    tabulate INDE07M, g(INDTYPE)

    reg LOGHOURPAY HQUAL AGE SEX ETHMIN INDTYPE1 INDTYPE2 INDTYPE3 INDTYPE4 INDTYPE5 INDTYPE6 INDTYPE7 INDTYPE8 INDTYPE9

  • #2
    Look up "dummy variable trap" Was there a note at the top indicating that the variable was omitted because of collinearity. Also, Stata can automatically create such variables see: -help fvvarlist-

    Compare:
    Code:
    . sysuse auto,clear
    (1978 Automobile Data)
    
    . qui tab rep, gen(D)
    
    . reg price D*
    note: D1 omitted because of collinearity
    
          Source |       SS           df       MS      Number of obs   =        69
    -------------+----------------------------------   F(4, 64)        =      0.24
           Model |  8360542.63         4  2090135.66   Prob > F        =    0.9174
        Residual |   568436416        64     8881819   R-squared       =    0.0145
    -------------+----------------------------------   Adj R-squared   =   -0.0471
           Total |   576796959        68  8482308.22   Root MSE        =    2980.2
    
    ------------------------------------------------------------------------------
           price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
              D1 |          0  (omitted)
              D2 |   1403.125   2356.085     0.60   0.554    -3303.696    6109.946
              D3 |   1864.733   2176.458     0.86   0.395    -2483.242    6212.708
              D4 |       1507   2221.338     0.68   0.500    -2930.633    5944.633
              D5 |     1348.5   2290.927     0.59   0.558    -3228.153    5925.153
           _cons |     4564.5   2107.347     2.17   0.034     354.5913    8774.409
    ------------------------------------------------------------------------------
    
    . reg price D*, hascon
    
          Source |       SS           df       MS      Number of obs   =        69
    -------------+----------------------------------   F(4, 64)        =      0.24
           Model |  8360542.63         4  2090135.66   Prob > F        =    0.9174
        Residual |   568436416        64     8881819   R-squared       =    0.0145
    -------------+----------------------------------   Adj R-squared   =   -0.0471
           Total |   576796959        68  8482308.22   Root MSE        =    2980.2
    
    ------------------------------------------------------------------------------
           price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
              D1 |     4564.5   2107.347     2.17   0.034     354.5913    8774.409
              D2 |   5967.625   1053.673     5.66   0.000     3862.671    8072.579
              D3 |   6429.233   544.1145    11.82   0.000      5342.24    7516.227
              D4 |     6071.5   702.4489     8.64   0.000     4668.197    7474.803
              D5 |       5913   898.5756     6.58   0.000     4117.889    7708.111
    ------------------------------------------------------------------------------

    Comment


    • #3
      Thank you for answering! I just saw that there was indeed a note saying omitted because of collinearity, not very thorough of me...Just read up on the dummy variable trap and it all makes sense now! in your example, does hascon drop the intercept so that you can keep all the dummies? Stata doesn't recognise it as a command for me, is it an add on?

      Comment


      • #4
        -hascon- is an option with -regress- See -help regress-.

        Yes, it drops the constant though it is not really necessary. In the first example above the coefficients are relative to the omitted category, so D5 = _b[D5] + _b[_cons] = 5913.

        Comment


        • #5
          I think im still struggling—using -hascon- returns (note: hascons false) and runs the regression with an omitted category. Do I need to specify something else in the reg command line? I see that you used D*, while I am using -i.catvar- now so I have -reg LOGHOURPAY HQUAL AGE SEX ETHMIN i.INDE07M, hascon- does that make sense? seeing as my categorical variable is type of industry I would prefer not having one of them as reference value for interpretation purposes. Thank you again.

          Comment


          • #6
            If you are using factor notation (i.varname) then Stata will automatically drop the first category. If you want direct estimates for all categories then you need to use the no base level operator: ibn.varname.

            Compare:
            Code:
            . sysuse auto,clear
            (1978 Automobile Data)
            
            . reg price i.rep, hascons
            (note: hascons false)
            
                  Source |       SS           df       MS      Number of obs   =        69
            -------------+----------------------------------   F(4, 64)        =      0.24
                   Model |  8360542.63         4  2090135.66   Prob > F        =    0.9174
                Residual |   568436416        64     8881819   R-squared       =    0.0145
            -------------+----------------------------------   Adj R-squared   =   -0.0471
                   Total |   576796959        68  8482308.22   Root MSE        =    2980.2
            
            ------------------------------------------------------------------------------
                   price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
                   rep78 |
                      2  |   1403.125   2356.085     0.60   0.554    -3303.696    6109.946
                      3  |   1864.733   2176.458     0.86   0.395    -2483.242    6212.708
                      4  |       1507   2221.338     0.68   0.500    -2930.633    5944.633
                      5  |     1348.5   2290.927     0.59   0.558    -3228.153    5925.153
                         |
                   _cons |     4564.5   2107.347     2.17   0.034     354.5913    8774.409
            ------------------------------------------------------------------------------
            
            . reg price ibn.rep , hascons
            
                  Source |       SS           df       MS      Number of obs   =        69
            -------------+----------------------------------   F(4, 64)        =      0.24
                   Model |  8360542.63         4  2090135.66   Prob > F        =    0.9174
                Residual |   568436416        64     8881819   R-squared       =    0.0145
            -------------+----------------------------------   Adj R-squared   =   -0.0471
                   Total |   576796959        68  8482308.22   Root MSE        =    2980.2
            
            ------------------------------------------------------------------------------
                   price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
                   rep78 |
                      1  |     4564.5   2107.347     2.17   0.034     354.5913    8774.409
                      2  |   5967.625   1053.673     5.66   0.000     3862.671    8072.579
                      3  |   6429.233   544.1145    11.82   0.000      5342.24    7516.227
                      4  |     6071.5   702.4489     8.64   0.000     4668.197    7474.803
                      5  |       5913   898.5756     6.58   0.000     4117.889    7708.111
            ------------------------------------------------------------------------------

            Comment


            • #7
              Thank you so much for your help!

              Comment


              • #8
                Hello Scott Merryman,

                I am running a Pooled data regression using XTREG and I noticed that the command "hascons" doesnt work for xtreg.

                Do you know another to run the regression with no constant through XTREG ?

                Thanks!!

                Comment


                • #9
                  No, but if you are running pooled regression then just use -regress-

                  Comment

                  Working...
                  X