Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Include ommited variables in estimates

    Dear Stata users,

    I'm running several sub-sample regressions over a factor variable, and would like to create a matrix containing the estimates vectors of each regresssion.
    However, when my subsample doesn't include enough observations for a category of the explicative variable, it doesn't appear on the estimates.
    The vectors I recover are therefore not directly comparable since the xth line of one doesn't necessarly corresponds to the same variable in each.

    Here's an example.
    Code:
    sysuse auto2.dta,clear
    reg price ib5.rep78 if foreign==0
    matrix b0=e(b)'
    reg price ib5.rep78 if foreign==1
    matrix b1=e(b)'
    
    matrix list b0
    matrix list b1
    
    matrix M=b0,b1
    As you see the two matrices don't have the same dimensions, because the categories1 and 2 of rep78 are omitted in the second regression
    Therefore, the final line causes an error, and the coefficient associated with category 3 is in 3rd row b0, but in the first in b1.

    I'd like to add two rows of missing values in the first two rows of b1, to make the matrix comparable and of the same dimension.

    Thanks a lot,
    Charlie

  • #2
    Code:
    matrix t = J(2,1,.)
    mat b2= t\b1
    mat M= b0, b2
    Code:
    . matrix M=b0,b2
    
    . mat list M
    
    M[6,2]
                      y1          c1
     1.rep78         360           .
     2.rep78    1763.125           .
     3.rep78   2402.5741       -1464
     4.rep78   1677.0556  -31.222222
    5b.rep78           0           0
       _cons      4204.5   6292.6667
    If missing rows are not consecutive, you will need to append several matrices.
    Last edited by Andrew Musau; 25 Sep 2018, 05:34.

    Comment


    • #3
      Thanks Andrew,
      Pretty smart and easy fix indeed!

      However, that's not entirely satisfying because my matrix will be 402*402 and nothing ensures the missing rows to be consecutive, so it still implies a lot of work that I'd like to be automatized.

      I'm thinking about recovering each _b[X] after each regression and assign them to the Xth row of a vector, and then append the 402 vectors, but still struggling with missing values (when _b[X] doesn't exist)

      Thanks anyway
      Best,
      Charlie

      Comment


      • #4
        I would use esttab (Stata Journal; Ben Jann) and retrieve the coefficients matrix

        Code:
        sysuse auto, clear
        eststo: qui reg price mpg i.rep78 weight
        eststo: qui reg price mpg weight
        qui esttab
        mat list r(coefs)
        Code:
        
        . mat list r(coefs)
        
        r(coefs)[8,6]
                       est1:       est1:       est1:       est2:       est2:       est2:
                          b           t           p           b           t           p
            mpg  -63.097096  -.72149918   .47331463  -49.512221  -.57468079   .56732373
        1.rep78           0           .           .          .z          .z          .z
        2.rep78   753.70237   .39260175   .69596018          .z          .z          .z
        3.rep78    1349.361   .76118707   .44943089          .z          .z          .z
        4.rep78     2030.47   1.1217511   .26629404          .z          .z          .z
        5.rep78   3376.9103   1.7771624   .08044658          .z          .z          .z
         weight   2.0930663    3.286329   .00167346   1.7465592   2.7232382   .00812981
          _cons   -598.9665  -.15121963   .88029342   1946.0687   .54101802   .59018863
        Here, you can just keep the estimates if you don't need the t-statistics and p-values.

        Comment


        • #5
          Thanks for the suggestion, It seems to work (with an extraction afterwards)




          Comment

          Working...
          X