Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problem with factor variables (1.var instead of i.var in regression changes results drastically)

    Hello,

    when running a regression with factor variables, I encounter certain problems.
    I made up an example using the auto dataset.
    My problem basically is, that I do not want to report "empty results" in a regression, which of course occur if one interacts two (0/1) dummies. So reg y i.var1r##i.var2 is better done as reg y 1.var1##1.var2. However there might arise new collianrity.

    Code:
    sysuse auto, clear
    
    gen hro=1 if (headroom ==1 | headroom ==3 | headroom ==5 )
    recode hro (.=0)
    
    
    gen light =1 if weight <=3000
    recode light (.=0)
    
    reg length (i.light##i.hro)##i.foreign, robust
    reg length (1.light##1.hro)##1.foreign, robust
    
    //So far: everything perfectly fine, i.e. there are exactly the same results
    
    
    reg length (i.light c.mpg i.hro)##i.foreign, robust
    reg length (i.light c.mpg 1.hro)##1.foreign, robust
    
    //So why is there *now* collinarity?
    
    reg length (i.light c.price i.hro)##i.foreign, robust
    reg length (i.light c.price 1.hro)##1.foreign, robust
    
    
    reg length (c.price  i.light##i.hro)##i.foreign, robust
    reg length (c.price  1.light##1.hro)##1.foreign, robust
    First; I get new collinearity, that hasn't been there before.

    Second, which I cannot reproduce that easy with the auto dataset, in my dataset I get really strange results:Using 1.var instead of i.var, I get totally different results and R^2 of 0.000. This is really puzzling to me.

    Thank you very much in advance for your help.


    FYI: I use Stat 12 on a Mac OS Sierra.
    Last edited by Andrea Maier; 24 Sep 2016, 06:24. Reason: included tags

  • #2
    why not just use the "noemptycells" option in your regress command:
    Code:
    reg length (i.light##i.hro)##i.foreign, robust noempty
    not tested on version 12

    Comment


    • #3
      Originally posted by Rich Goldstein View Post
      why not just use the "noemptycells" option in your regress command:
      Code:
      reg length (i.light##i.hro)##i.foreign, robust noempty
      not tested on version 12
      Thanks.
      But it does not really work, to be honest.
      Because i still get empty cells here If I use the code you indicated, I get
      Code:
      light#hro#foreign |
                 1 1 1  |          0  (omitted)

      Sorry, I realised, I can also use the option "noomit", which produces a nice output. Edit2: But not when using eststo and estout for generating nice tables afterwards, the omitted 0 than appear again.


      But my question remains: Why do the values (and therefore also R^2 and/or collinearity) change, when I move from i.var1 to 1.var1 and alike.

      Thank you very much in advance.
      Last edited by Andrea Maier; 24 Sep 2016, 06:23.

      Comment


      • #4
        But my question remains: Why do the values (and therefore also R^2 and/or collinearity) change, when I move from i.var1 to 1.var1 and alike
        I ran your code and I'm not certain I understand where the problem lies.

        You present four pairs of regressions. In each pair the same dependent variable is fit to the same combination of independent variables, but in the second regression some or all of the i.var notation is replaced by 1.var notation.

        The two regressions in each pair produce identical results - i.var and 1.var notation give identical results for the same model - but the results differ from pair to pair.

        The first pair reports "note: 1.light#1.hro#1.foreign omitted because of collinearity".

        The second and third pairs do not report collinearity.

        The fourth pair again reports "note: 1.light#1.hro#1.foreign omitted because of collinearity".

        I do not see what supports the assertion that the values change when you change from i.var to 1.var notation.

        Comment


        • #5
          Really?

          That's weird.

          Here is my output for:

          Code:
           sysuse auto,
          clear gen hro=1 if (headroom ==1 | headroom ==3 | headroom ==5 )
          recode hro (.=0)
          gen light =1 if weight <=3000
          recode light (.=0)
          reg length (i.light c.mpg i.hro)##i.foreign, robust
          reg length (i.light c.mpg 1.hro)##1.foreign, robust


          reg length (i.light c.mpg i.hro)##i.foreign, robust

          Linear regression Number of obs = 74
          F( 7, 66) = 54.70
          Prob > F = 0.0000
          R-squared = 0.8383
          Root MSE = 9.4164

          -------------------------------------------------------------------------------
          | Robust
          length | Coef. Std. Err. t P>|t| [95% Conf. Interval]
          --------------+----------------------------------------------------------------
          1.light | -12.79601 3.792573 -3.37 0.001 -20.36813 -5.223893
          mpg | -2.727721 .3910998 -6.97 0.000 -3.508577 -1.946865
          1.hro | -9.143837 4.80167 -1.90 0.061 -18.73068 .4430057
          1.foreign | -47.46804 9.036127 -5.25 0.000 -65.50925 -29.42683
          |
          light#foreign |
          1 1 | -2.868619 5.522124 -0.52 0.605 -13.89389 8.156657
          |
          foreign#c.mpg |
          1 | 1.737606 .5230046 3.32 0.001 .6933934 2.781818
          |
          hro#foreign |
          1 1 | 7.189848 8.002467 0.90 0.372 -8.787593 23.16729
          |
          _cons | 255.3148 7.104521 35.94 0.000 241.1302 269.4995
          -------------------------------------------------------------------------------

          . reg length (i.light c.mpg 1.hro)##1.foreign, robust
          note: 1.foreign#c.mpg omitted because of collinearity

          Linear regression Number of obs = 74
          F( 6, 67) = 49.49
          Prob > F = 0.0000
          R-squared = 0.8126
          Root MSE = 10.061

          -------------------------------------------------------------------------------
          | Robust
          length | Coef. Std. Err. t P>|t| [95% Conf. Interval]
          --------------+----------------------------------------------------------------
          1.light | -20.11783 3.931713 -5.12 0.000 -27.96557 -12.2701
          mpg | -1.753218 .3586451 -4.89 0.000 -2.469077 -1.03736
          1.hro | -8.306423 4.587507 -1.81 0.075 -17.46313 .8502793
          1.foreign | -18.30179 2.92167 -6.26 0.000 -24.13347 -12.47011
          |
          light#foreign |
          1 1 | 11.64818 4.451338 2.62 0.011 2.763272 20.53309
          |
          foreign#c.mpg |
          1 | 0 (omitted)
          |
          hro#foreign |
          1 1 | 8.314699 7.920606 1.05 0.298 -7.494897 24.1243
          |
          _cons | 237.9767 6.494496 36.64 0.000 225.0136 250.9397
          -------------------------------------------------------------------------------




          And it differs!
          Last edited by Andrea Maier; 26 Sep 2016, 08:59. Reason: Formatting was really bad. Any nice way to include stata outputs in here?

          Comment


          • #6
            Code:
            . about
            
            Stata/SE 14.2 for Mac (64-bit Intel)
            Revision 06 Sep 2016
            Copyright 1985-2015 StataCorp LP
            
            [details deleted]
            
            . sysuse auto, clear
            (1978 Automobile Data)
            
            . gen hro=1 if (headroom ==1 | headroom ==3 | headroom ==5 )
            (60 missing values generated)
            
            . recode hro (.=0)
            (hro: 60 changes made)
            
            . gen light =1 if weight <=3000
            (39 missing values generated)
            
            . recode light (.=0)
            (light: 39 changes made)
            
            . reg length (i.light c.mpg i.hro)##i.foreign, robust
            
            Linear regression                               Number of obs     =         74
                                                            F(7, 66)          =      54.70
                                                            Prob > F          =     0.0000
                                                            R-squared         =     0.8383
                                                            Root MSE          =     9.4164
            
            -------------------------------------------------------------------------------
                          |               Robust
                   length |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
            --------------+----------------------------------------------------------------
                  1.light |  -12.79601   3.792573    -3.37   0.001    -20.36813   -5.223893
                      mpg |  -2.727721   .3910998    -6.97   0.000    -3.508577   -1.946865
                    1.hro |  -9.143837    4.80167    -1.90   0.061    -18.73068    .4430057
                          |
                  foreign |
                 Foreign  |  -47.46804   9.036127    -5.25   0.000    -65.50925   -29.42683
                          |
            light#foreign |
               1#Foreign  |  -2.868619   5.522124    -0.52   0.605    -13.89389    8.156657
                          |
            foreign#c.mpg |
                 Foreign  |   1.737606   .5230046     3.32   0.001     .6933934    2.781818
                          |
              hro#foreign |
               1#Foreign  |   7.189848   8.002467     0.90   0.372    -8.787593    23.16729
                          |
                    _cons |   255.3148   7.104521    35.94   0.000     241.1302    269.4995
            -------------------------------------------------------------------------------
            
            . reg length (i.light c.mpg 1.hro)##1.foreign, robust
            
            Linear regression                               Number of obs     =         74
                                                            F(7, 66)          =      54.70
                                                            Prob > F          =     0.0000
                                                            R-squared         =     0.8383
                                                            Root MSE          =     9.4164
            
            -------------------------------------------------------------------------------
                          |               Robust
                   length |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
            --------------+----------------------------------------------------------------
                  1.light |  -12.79601   3.792573    -3.37   0.001    -20.36813   -5.223893
                      mpg |  -2.727721   .3910998    -6.97   0.000    -3.508577   -1.946865
                    1.hro |  -9.143837    4.80167    -1.90   0.061    -18.73068    .4430057
                          |
                  foreign |
                 Foreign  |  -47.46804   9.036127    -5.25   0.000    -65.50925   -29.42683
                          |
            light#foreign |
               1#Foreign  |  -2.868619   5.522124    -0.52   0.605    -13.89389    8.156657
                          |
            foreign#c.mpg |
                 Foreign  |   1.737606   .5230046     3.32   0.001     .6933934    2.781818
                          |
              hro#foreign |
               1#Foreign  |   7.189848   8.002467     0.90   0.372    -8.787593    23.16729
                          |
                    _cons |   255.3148   7.104521    35.94   0.000     241.1302    269.4995
            -------------------------------------------------------------------------------
            
            .

            Comment


            • #7
              Okay okay, I believe you.
              Thank you so much.


              . about

              Code:
              Stata/SE 12.0 for Mac (64-bit Intel)
              Revision 24 Aug 2011
              Copyright 1985-2011 StataCorp LP

              I mean, how can it be? Why doI receive obcviously wrong result if I use 1.var instead of i.var with a 0/1 dummy var?

              Comment


              • #8
                I haven't the patience to go through help whatsnew looking at all the changes to Stata between releases 12.0 and 14.2, but my guess is that factor variable notation has been improved (corrected?) subsequent to the version you are running. Try update all to, hopefully, update your copy of Stata to the latest version of the Release 12 series and see if that helps.

                Comment


                • #9
                  Ah thanks, but still doesn't work. Well,...

                  Comment


                  • #10
                    Andrea:
                    the nicest way to include Stata outputs in your post is to copy what you got in Stata Results window and paste it in your message within CODE delimiters (as you already did for your Stata codes in #5 and #7).
                    Kind regards,
                    Carlo
                    (Stata 18.0 SE)

                    Comment


                    • #11
                      Thank you very much.


                      I still wonder how in Stata 12 it can happen, that using 1.variable notation leads to R^2 of 0?

                      Comment

                      Working...
                      X