Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bias corrected logistic regression

    Using Stata 14.1 MP 32/64 under Windows. I'm considering this model: http://www.people.fas.harvard.edu/~k...irth/firth.pdf. The main idea is to bias correct the estimates by splitting each of the original observations into two new observations, one taking the value yhij and the other 1-yhij with weights 1+qhij/2 and qhij/2, respectively (see equation for q below). The article says it's easily implemented in any survey software, but I'm having trouble seeing how to do this using standard procedures in Stata. Not everything is in the draft so I'm going to try to provide some equations via images. Here are the relevant pieces:

    Click image for larger version

Name:	WEE1.png
Views:	1
Size:	17.5 KB
ID:	1352829
    Click image for larger version

Name:	WEE2.png
Views:	1
Size:	66.5 KB
ID:	1352830

    I'm not exactly sure how to implement this using the svy command in Stata. It seems that rather than a constant weight, it is a function of q which changes as the estimation iterates, Is this internal to Stata? If not, how could this be programmed?

  • #2
    Dear Bill,

    You are aware of the -firthlogit- command, right?

    Joao

    Comment


    • #3
      Yes. I'm considering this new approach as an alternative since firthlogit does not support svy. I was going to start a separate thread on this topic, but since you raised the issue, it was discussed here a while back: http://http://www.stata.com/statalis.../msg01126.html. However, regression is frequently performed on survey data without incorporating sample design information, so I'm not sure how using firthlogit would be any different from what others have done.

      The data augmentation approach of Rader, et al avoids this controversy, but it looks to me like one would have to control the iterations: initialize with an average weight and compute a log likelihood, test for convergence, if not converged, recomputed the weights and repeat the process until convergence is achieved-if I understand it properly. All of this could be done using commands that support svy. Iterations can be limited to 1 using iterate in glm and logit. I'm not sure how to write an algorithm to test for convergence, however, or if it's even necessary. The article seems to indicate that the analysis can be carried out by standard routines.

      Comment


      • #4
        I just noticed something I missed in the Rader paper: the q values are simply the leverage for observation hij. I think those values are available post-prediction on logit, but I'm not sure about glm. Maybe that's a way to simplify this process.

        Comment


        • #5
          Here's a little code that I developed to conduct the analysis:

          Code:
          gen q=1/147676
          gen wt=WTFA_SC3*(1+q/2) if rec==1
          replace wt=WTFA_SC3*(q/2) if rec==2
          
          svyset [pw=wt], strata(STRATUMRi) psu(PSUi) singleunit(centered)
          
          logit AD i.apc_age if apc_age>0, iter(1)
          predict lev, leverage
          drop q
          drop wt
          rename lev q
          summarize q
          
          if e(converged)==0{
            while e(converged)==0 {
              gen wt=WTFA_SC3*(1+q/2) if rec==1
              replace wt=WTFA_SC3*(q/2) if rec==2
           svyset [pw=wt], strata(STRATUMRi) psu(PSUi) singleunit(centered)
           logit AD i.apc_age if apc_age>0, iter(1)
           predict lev, hat
              drop q
              drop wt
              rename lev q
           }
          }
          The flaw in this is the predict command. I don't think this is allowed in svy. Can anyone give me any ideas on a work around? I'll change this to domain analysis once that's fixed.

          Comment

          Working...
          X