Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Jackknife identical coefficients to simple regression

    Hi,
    Why am I getting identical coefficients from the -jackknife- to those from the simple regression?
    That is both (1) and (2) below are giving the same coefficients:
    (1) xthtaylor y x1 x2 x3 ...., endog(x1 x2 ) constant(x3 ...)
    (2) jackknife _b _se, xthtaylor y x1 x2 x3 ...., endog(x1 x2 ) constant(x3 ...)

    Both coefficients above are slightly different from taking the average of the individual replications stored by Jackknife....
    Which ones are the correct jackknife coefficients?
    Many thanks!

  • #2
    If you want the jacknife estimates, after running the command type

    Code:
    mat list e(b_jk)
    -------------------------------------------
    Richard Williams, Notre Dame Dept of Sociology
    Stata Version: 17.0 MP (2 processor)

    EMAIL: [email protected]
    WWW: https://www3.nd.edu/~rwilliam

    Comment


    • #3
      Thanks Richard!
      However, I'm a bit confused because the jackknife estimate I get from code:
      mat list e(b_jk) differs from the average of the coefficients of all the replication each of which dropping a data point. I thought they should be equal. What am I missing.
      Most appreciated!

      Comment


      • #4
        I never use jackknife so I don't know. The manual entry -- http://www.stata.com/manuals13/rjackknife.pdf -- does include formulas.

        It also recommends that, instead of using the jackknife prefix, you use vce(jackknife) when you can.
        -------------------------------------------
        Richard Williams, Notre Dame Dept of Sociology
        Stata Version: 17.0 MP (2 processor)

        EMAIL: [email protected]
        WWW: https://www3.nd.edu/~rwilliam

        Comment


        • #5
          Thanks again Richard for your thoughts and help.
          he formulas indicate that the jackknife _b coefficients should be the average of the replication results, but e(b_jk) is yielding different results!
          so now I'm confused, if I want to use jackknife coefficients what should I use?
          mat list e(b_jk) or manually get the average of the replication results? I appreciate the forum's input.
          Best,
          Mohamad

          Comment


          • #6
            Please review the FAQ Advice on how to pose a question. If you showed specific commands that you gave and specific results that you got you would be providing concrete detail for others to consider. As you are using xthtaylor, you could give us an example based on

            Code:
            webuse psidextract

            Comment


            • #7
              Whenever Stata does something that surprises me, probably 99 times out of 100 it is because Stata is smarter than me. But, Stata does occasionally make mistakes. Like Nick says, with code and output (or better yet a replicable example) we might be able to advise you on the matter.

              Keep in mind, though, that the help for jackknife says "Although the jackknife—developed in the late 1940s and early 1950s—is of largely historical interest today, it is still useful in searching for overly influential observations. This feature is often forgotten." Unless you are interested in overly influential observations, I am not sure how much effort you should put into this.
              -------------------------------------------
              Richard Williams, Notre Dame Dept of Sociology
              Stata Version: 17.0 MP (2 processor)

              EMAIL: [email protected]
              WWW: https://www3.nd.edu/~rwilliam

              Comment


              • #8
                Thank you Nick for the advice, I'll make sure to follow forum protocol.
                Richard, a referee asked specifically for Jackknife coefficients and standard errors, hence the effort I'm putting into it.
                So here is an example of the problem I'm facing that can be replicated:

                Code:
                ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                . clear
                . webuse psidextract
                . keep if id <100  
                //to speed up the process//
                (3472 observations deleted)
                
                . xthtaylor lwage wks south smsa ms exp exp2 occ ind union fem blk ed, endog(exp exp2 wks ms union ed) constant(fem blk ed)
                
                Hausman-Taylor estimation                       Number of obs      =       693
                Group variable: id                              Number of groups   =        99
                
                                                                Obs per group: min =         7
                                                                               avg =         7
                                                                               max =         7
                
                Random effects u_i ~ i.i.d.                     Wald chi2(12)      =   1775.07
                                                                Prob > chi2        =    0.0000
                
                ------------------------------------------------------------------------------
                       lwage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                -------------+----------------------------------------------------------------
                TVexogenous  |
                       south |  -.0364653   .0631064    -0.58   0.563    -.1601516     .087221
                        smsa |  -.1257136   .0386382    -3.25   0.001    -.2014432   -.0499841
                         occ |  -.0405796   .0289379    -1.40   0.161    -.0972968    .0161377
                         ind |   .0161078   .0280947     0.57   0.566    -.0389567    .0711724
                TVendogenous |
                         exp |   .1021172   .0055424    18.42   0.000     .0912543      .11298
                        exp2 |  -.0000346   .0001152    -0.30   0.764    -.0002605    .0001913
                         wks |   .0028614   .0012746     2.24   0.025     .0003632    .0053597
                          ms |  -.0351452   .0389833    -0.90   0.367    -.1115511    .0412607
                       union |  -.0488003   .0380902    -1.28   0.200    -.1234557    .0258552
                TIexogenous  |
                         fem |   .1833829   .3086757     0.59   0.552    -.4216104    .7883762
                         blk |  -.4720181   .3777375    -1.25   0.211     -1.21237    .2683337
                TIendogenous |
                          ed |   .2263084   .0470322     4.81   0.000      .134127    .3184899
                             |
                       _cons |   1.615403   .6480112     2.49   0.013     .3453244    2.885482
                -------------+----------------------------------------------------------------
                     sigma_u |  .95198212
                     sigma_e |  .12644086
                         rho |  .98266504   (fraction of variance due to u_i)
                ------------------------------------------------------------------------------
                Note:  TV refers to time varying; TI refers to time invariant.
                
                . jackknife _b[south] _se[south] , cluster(id) saving(jackknife.dta, replace) : xthtaylor lwage wks south smsa ms exp exp2 occ ind union fem blk ed, endog(exp exp2 wks ms union ed) constant
                > (fem blk ed)
                (running xthtaylor on estimation sample)
                
                Jackknife replications (99)
                ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
                ..................................................    50
                .................................................
                
                Jackknife results                               Number of obs      =       693
                                                                Replications       =        99
                
                      command:  xthtaylor lwage wks south smsa ms exp exp2 occ ind union fem blk ed, endog(exp exp2 wks ms union ed) constant(fem blk ed)
                        _jk_1:  _b[south]
                        _jk_2:  _se[south]
                          n():  e(N)
                
                                                     (Replications based on 99 clusters in id)
                ------------------------------------------------------------------------------
                             |              Jackknife
                             |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                -------------+----------------------------------------------------------------
                       _jk_1 |  -.0364653   .0635265    -0.57   0.567    -.1625315    .0896009
                       _jk_2 |   .0631064   .0252456     2.50   0.014     .0130074    .1132055
                ------------------------------------------------------------------------------
                
                . mat list e(b_jk)
                
                e(b_jk)[1,2]
                         _jk_1       _jk_2
                y1  -.05062209   .01740046
                
                . clear
                
                . use jackknife
                (jackknife: xthtaylor)
                
                . su
                
                    Variable |       Obs        Mean    Std. Dev.       Min        Max
                -------------+--------------------------------------------------------
                       _jk_1 |        99   -.0363208    .0064498  -.0748039   .0010853
                       _jk_2 |        99    .0635728    .0025632   .0618298   .0823358
                
                . clear
                ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                In this example, the -xthtaylor- coefficient of the variable "south" is -.0364653. When running -jackknife- on the -xthtaylor-, followed by
                Code:
                mat list e(b_jk)
                to get the Jackknife estimate, I get -.05062209. However, the average of the individual replications of the coefficient for "south" is ​ -.0363208.
                -Which of these is the Jackknife estimate of the coefficient for "south"?
                -Shouldn't the last two values be the same?
                The same applies for the standard error.

                I appreciate your guidance.
                Best,
                Mohamad

                Comment


                • #9
                  I think if you use the -mse- option on jackknife you get the kind of results you are expecting:

                  Code:
                  clear
                  webuse psidextract
                  keep if id <100
                  xthtaylor lwage wks south smsa ms exp exp2 occ ind union fem blk ed, endog(exp exp2 wks ms union ed) constant(fem blk ed)
                  jackknife _b[south] _se[south] , cluster(id) saving(jackknife.dta, replace double) mse: /// 
                      xthtaylor lwage wks south smsa ms exp exp2 occ ind union fem blk ed, /// 
                      endog(exp exp2 wks ms union ed) constant(fem blk ed) 
                  mat list e(b_jk)
                  
                  clear
                  use jackknife
                  sum
                  Showing the last part of the results,

                  Code:
                  . mat list e(b_jk)
                  
                  e(b_jk)[1,2]
                           _jk_1       _jk_2
                  y1  -.03632081    .0635728
                  
                  . 
                  . clear
                  
                  . use jackknife
                  (jackknife: xthtaylor)
                  
                  . sum
                  
                      Variable |       Obs        Mean    Std. Dev.       Min        Max
                  -------------+--------------------------------------------------------
                         _jk_1 |        99   -.0363208    .0064498  -.0748039   .0010853
                         _jk_2 |        99    .0635728    .0025632   .0618298   .0823358
                  I can't really explain why but see the methods and formulas section of the jackknife documentation I linked to earlier.

                  Whether you decide to use the mse option or not (I am not sure what is most appropriate) I would use the parameters given by mat list e(b_jk). When using the mse option, both the mat list and the sum approach give the same estimates. When not using the mse option, there is something wrong with your sum approach, but I can't tell you what is the right way to do it. Maybe you can figure it out from the formulas. But given that I can see that both mat list and sum give the same results when mse is specified, I personally am willing to mindlessly trust that mat list is also getting it right when mse is not specified.

                  -------------------------------------------
                  Richard Williams, Notre Dame Dept of Sociology
                  Stata Version: 17.0 MP (2 processor)

                  EMAIL: [email protected]
                  WWW: https://www3.nd.edu/~rwilliam

                  Comment


                  • #10
                    Ok, the keep option keeps the pseudovalues as part of the data. So you can replicate what Stata is doing via something like this:

                    Code:
                    clear
                    webuse psidextract
                    keep if id <100
                    xthtaylor lwage wks south smsa ms exp exp2 occ ind union fem blk ed, endog(exp exp2 wks ms union ed) constant(fem blk ed)
                    jackknife _b[south] _se[south]  , cluster(id) saving(jackknife.dta, replace double) keep: /// 
                        xthtaylor lwage wks south smsa ms exp exp2 occ ind union fem blk ed, /// 
                        endog(exp exp2 wks ms union ed) constant(fem blk ed) 
                    mat list e(b_jk)
                    sum _*
                    Code:
                    . mat list e(b_jk)
                    
                    e(b_jk)[1,2]
                             _jk_1       _jk_2
                    y1  -.05062209   .01740042
                    
                    . sum _*
                    
                        Variable |       Obs        Mean    Std. Dev.       Min        Max
                    -------------+--------------------------------------------------------
                           _jk_1 |        99   -.0506221    .6320803  -3.716417   3.720718
                           _jk_2 |        99    .0174004    .2511904  -1.821369    .188215
                    In short whether you use the mse option or don't use the mse option, Stata is doing it right. Either take it on blind faith or look at the formulas to see what you should be summing with each approach in order to replicate the numbers Stata is giving you. More simply, mat list e(b_jk) gives you the numbers you seem to want. Although I wonder if your reviewer would have been happy if you had just presented the jackknifed standard errors.
                    -------------------------------------------
                    Richard Williams, Notre Dame Dept of Sociology
                    Stata Version: 17.0 MP (2 processor)

                    EMAIL: [email protected]
                    WWW: https://www3.nd.edu/~rwilliam

                    Comment


                    • #11
                      Most appreciated Richard. Thank you for sticking out with me and clarifying the issue.
                      Reviewing the formulas on page 12 of the manual clarified things a little bit. Using the mse option leads Stata to compute the estimate based on the actual observed values and those excluding the (j)th observation. Not stating it leads Stata to compute it based on Stata's pseudovalues, whose formula is also listed. Since I want my estimate based on the actual values, I will go with the mse option.
                      PS The reviewer was actually not satisfied with the Jackknife std errors only (which I added acording to his request) and specifically asked for the Jackknife coefficients as well!
                      Thank you Richard...
                      Best,
                      Mohamad

                      Comment

                      Working...
                      X