Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dropping variables with collinearity

    Hi everyone, I wan to remove all the variables in my data which have collinearity. I found the command "_rmcoll" but it does not automatically drop the collinear variables but only list them.

    In the example from the Stata manual:

    Code:
    webuse auto
    generate tt = turn + trunk
    _rmcoll price-tt, forcedrop
    I would like Stata to remove the variable tt.
    Apparently by adding "forcedrop" the collinear variables should be dropped, but when I try it, this does not happen. Am I missing something or is there another way to drop that list of variables?

    Any help would be very much appreciated!

  • #2
    Maybe this text from the Stata Manual helps:

    forcedrop specifies that collinear variables be dropped from the variable list instead of being flagged. This option is not allowed when the variable list already contains flagged variables, factor variables, or interactions.
    Maybe typing - display r(varlist) - with show what happened.
    Best regards,

    Marcos

    Comment


    • #3
      Philip:
      even with -regress- Stata omits one of the variable creating extreme collinearity issue.
      The omitted variable is -turn-:
      Code:
      . webuse auto
      (1978 Automobile Data)
      
      .
      . generate tt = turn + trunk
      
      . regress price turn trunk tt
      note: turn omitted because of collinearity
      
            Source |       SS           df       MS      Number of obs   =        74
      -------------+----------------------------------   F(2, 71)        =      4.91
             Model |  77228695.2         2  38614347.6   Prob > F        =    0.0100
          Residual |   557836701        71  7856854.94   R-squared       =    0.1216
      -------------+----------------------------------   Adj R-squared   =    0.0969
             Total |   635065396        73  8699525.97   Root MSE        =      2803
      
      ------------------------------------------------------------------------------
             price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
              turn |          0  (omitted)
             trunk |   11.75987    169.353     0.07   0.945    -325.9204    349.4401
                tt |   126.6771   93.30743     1.36   0.179    -59.37263    312.7268
             _cons |  -761.7631   3108.734    -0.25   0.807    -6960.404    5436.877
      ------------------------------------------------------------------------------
      Kind regards,
      Carlo
      (Stata 18.0 SE)

      Comment


      • #4
        Thank you very much for the quick answers. The thing is that I need the data set without collinear variables for later calculations. Hence the fact that Stata accounts for it when using the reg command does not help me much. However, I think I found what I was looking for which goes in the direction of Marcos' hint. I tried the following command and it worked:

        Code:
        webuse auto
        generate tt = turn + trunk
        _rmcoll price-tt, forcedrop
        keep `r(varlist)'
        Best regards,
        Philip

        Comment


        • #5
          Philip:
          the need for removing collinear variables in whatever feasible way may conceal a model misspecification issue.
          Kind regards,
          Carlo
          (Stata 18.0 SE)

          Comment


          • #6
            Hi Carlo, thank you for the hint. I will take that into account.

            Best regards,
            Philip

            Comment

            Working...
            X