Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Cholesky decomposition using regression Results

    Dear all,
    Im currently trying to obtain a simple Cholesky decomposition based on output from a regression output. However, I keep getting an error of "Matrix not positive define".
    I believe the problem comes from using categorical variables directly in stata.
    Here is a simple example of what I am trying to say:
    Code:
    sysuse auto, clear
    * This version works
    reg price mpg foreign
    matrix cv=cholesky(e(V))
    * This however gives a problem
    reg price mpg i.foreign
    matrix cv2=cholesky(e(V))
    ​​​​​​​matrix not positive definite
    r(506);
    Is anyone aware of an alternative for this type of transformation?
    Thank you
    Fernando
    PS. Im currently using Stata 15

  • #2
    You are correct that the issue arises because Stata includes rows and columns of zeros for the base levels in the regression. You can use matselrc (Stata Journal; Nick Cox) to get rid of the unwanted rows and columns. If these are too many, the Mata function select() may be faster, but exporting the matrix to Mata will get rid of the variable names, and you will need a workaround to place them back.


    Code:
    reg price mpg i.foreign 
    mat list e(V)
    The unwanted row and column are the second.


    Code:
    symmetric e(V)[4,4]
                                    0b.          1.            
                       mpg     foreign     foreign       _cons
           mpg   3101.5675
    0b.foreign           0           0
     1.foreign  -15339.746           0   490221.18
         _cons  -61494.541           0   180953.69   1342433.8



    Code:
    matselrc e(V) V , r(1, 3/4) c(1, 3/4)
    matrix cv2=cholesky(V)

    Comment


    • #3
      Note also that rather than let Stata take care of eliminating the base category, you can explicitly include just the categories you want, in which case Stata does not include the categories you did not select in e(V).
      Code:
      reg price mpg 1.foreign 
      matrix list e(V)
      Code:
      symmetric e(V)[3,3]
                                      1.            
                        mpg     foreign       _cons
            mpg   3101.5675
      1.foreign  -15339.746   490221.18
          _cons  -61494.541   180953.69   1342433.8
      I will say that this approach feels like something of a hack, especially because it requires listing categories you want - trying to omit the categories you don't want does not have the same effect.
      Code:
      reg price mpg ibno0.foreign 
      matrix list e(V)
      Code:
      symmetric e(V)[4,4]
                                      0o.          1.            
                         mpg     foreign     foreign       _cons
             mpg   3101.5675
      0o.foreign           0           0
       1.foreign  -15339.746           0   490221.18
           _cons  -61494.541           0   180953.69   1342433.8

      Comment


      • #4
        Thank you Andrew,
        I think i ll use something similar to what you are proposing.
        Thank you William,
        Actually, regarding that option that you suggest, when I want to use my own selection of categories, i usually end up with a different problem. Stata does not allow me to choose all the options that I would like to.
        Code:
        sysuse auto, clear
        xtile qmpg, n(3)
         
        reg price 1.qmpg  2.qmpg
        
              Source |       SS           df       MS      Number of obs   =        74
        -------------+----------------------------------   F(1, 72)        =      1.80
               Model |  15486507.2         1  15486507.2   Prob > F        =    0.1840
            Residual |   619578889        72  8605262.35   R-squared       =    0.0244
        -------------+----------------------------------   Adj R-squared   =    0.0108
               Total |   635065396        73  8699525.97   Root MSE        =    2933.5
        
        ------------------------------------------------------------------------------
               price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
              2.qmpg |  -977.2417   728.4627    -1.34   0.184    -2429.405    474.9221
               _cons |     6482.2   414.8557    15.63   0.000       5655.2      7309.2
        ------------------------------------------------------------------------------
        
        * Creating Dummies by hand
         tab qmpg, gen(qmpg_)
        
        
        . reg price qmpg_1 qmpg_2
        
              Source |       SS           df       MS      Number of obs   =        74
        -------------+----------------------------------   F(2, 71)        =      9.74
               Model |   136763695         2  68381847.5   Prob > F        =    0.0002
            Residual |   498301701        71  7018333.82   R-squared       =    0.2154
        -------------+----------------------------------   Adj R-squared   =    0.1933
               Total |   635065396        73  8699525.97   Root MSE        =    2649.2
        
        ------------------------------------------------------------------------------
               price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
              qmpg_1 |   3124.847   751.7202     4.16   0.000      1625.96    4623.735
              qmpg_2 |   710.1757   773.0301     0.92   0.361    -831.2025    2251.554
               _cons |   4794.783   552.3993     8.68   0.000      3693.33    5896.235
        ------------------------------------------------------------------------------
        While i know that which dummy is left as the omitted category does not matter as long as one is careful with the interpretation, or that i can simply use a recode of the variable that is giving me problems, I find it problematic when comparing a large set of specifications.
        As a background, which i neglected to mention before, I was trying to obtain the cholesky decomposition to obtain imputations from the above model. Right now I am using the -drawnorm- command to get multivariate normal distributions. This command does not seem to have problems with obtaining the Cholesky decomposition.
        Fernando

        Comment


        • #5
          Your problem in post #4 is that specifying particular categories does not override the (default, in your case) specification of the base category. Either of the following sets of code will do what you intend.
          Code:
          sysuse auto, clear
          xtile qmpg = mpg, n(3)
          reg price 1bn.qmpg 2.qmpg
          matrix list e(V)
          Code:
          sysuse auto, clear
          xtile qmpg = mpg, n(3)
          fvset base none qmpg
          reg price 1.qmpg 2.qmpg
          matrix list e(V)
          Both produce the following output.
          Code:
                Source |       SS           df       MS      Number of obs   =        74
          -------------+----------------------------------   F(2, 71)        =      9.74
                 Model |   136763695         2  68381847.5   Prob > F        =    0.0002
              Residual |   498301701        71  7018333.82   R-squared       =    0.2154
          -------------+----------------------------------   Adj R-squared   =    0.1933
                 Total |   635065396        73  8699525.97   Root MSE        =    2649.2
          
          ------------------------------------------------------------------------------
                 price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
          -------------+----------------------------------------------------------------
                  qmpg |
                    1  |   3124.847   751.7202     4.16   0.000      1625.96    4623.735
                    2  |   710.1757   773.0301     0.92   0.361    -831.2025    2251.554
                       |
                 _cons |   4794.783   552.3993     8.68   0.000      3693.33    5896.235
          ------------------------------------------------------------------------------
          Code:
          symmetric e(V)[3,3]
                           1.          2.            
                        qmpg        qmpg       _cons
          1.qmpg   565083.24
          2.qmpg   305144.95   597575.52
           _cons  -305144.95  -305144.95   305144.95

          Comment


          • #6
            Excellent Thank you.

            Comment

            Working...
            X