Bias corrected logistic regression

Bill Smith

Join Date: Sep 2014

Posts: 158
#1

Bias corrected logistic regression

11 Aug 2016, 09:46

Using Stata 14.1 MP 32/64 under Windows. I'm considering this model: http://www.people.fas.harvard.edu/~k...irth/firth.pdf. The main idea is to bias correct the estimates by splitting each of the original observations into two new observations, one taking the value y_hij and the other 1-y_hij with weights 1+q_hij/2 and q_hij/2, respectively (see equation for q below). The article says it's easily implemented in any survey software, but I'm having trouble seeing how to do this using standard procedures in Stata. Not everything is in the draft so I'm going to try to provide some equations via images. Here are the relevant pieces:

I'm not exactly sure how to implement this using the svy command in Stata. It seems that rather than a constant weight, it is a function of q which changes as the estimation iterates, Is this internal to Stata? If not, how could this be programmed?
Tags: None
Joao Santos Silva

Join Date: Apr 2014

Posts: 3010
#2

11 Aug 2016, 16:54

Dear Bill,

You are aware of the -firthlogit- command, right?

Joao
Comment
Bill Smith

Join Date: Sep 2014

Posts: 158
#3

12 Aug 2016, 05:59

Yes. I'm considering this new approach as an alternative since firthlogit does not support svy. I was going to start a separate thread on this topic, but since you raised the issue, it was discussed here a while back: http://http://www.stata.com/statalis.../msg01126.html. However, regression is frequently performed on survey data without incorporating sample design information, so I'm not sure how using firthlogit would be any different from what others have done.

The data augmentation approach of Rader, et al avoids this controversy, but it looks to me like one would have to control the iterations: initialize with an average weight and compute a log likelihood, test for convergence, if not converged, recomputed the weights and repeat the process until convergence is achieved-if I understand it properly. All of this could be done using commands that support svy. Iterations can be limited to 1 using iterate in glm and logit. I'm not sure how to write an algorithm to test for convergence, however, or if it's even necessary. The article seems to indicate that the analysis can be carried out by standard routines.
Comment
Bill Smith

Join Date: Sep 2014

Posts: 158
#4

12 Aug 2016, 08:03

I just noticed something I missed in the Rader paper: the q values are simply the leverage for observation hij. I think those values are available post-prediction on logit, but I'm not sure about glm. Maybe that's a way to simplify this process.
Comment

Bill Smith

Join Date: Sep 2014
Posts: 158

22 Aug 2016, 06:55

Here's a little code that I developed to conduct the analysis:

Code:

gen q=1/147676
gen wt=WTFA_SC3*(1+q/2) if rec==1
replace wt=WTFA_SC3*(q/2) if rec==2

svyset [pw=wt], strata(STRATUMRi) psu(PSUi) singleunit(centered)

logit AD i.apc_age if apc_age>0, iter(1)
predict lev, leverage
drop q
drop wt
rename lev q
summarize q

if e(converged)==0{
  while e(converged)==0 {
    gen wt=WTFA_SC3*(1+q/2) if rec==1
    replace wt=WTFA_SC3*(q/2) if rec==2
 svyset [pw=wt], strata(STRATUMRi) psu(PSUi) singleunit(centered)
 logit AD i.apc_age if apc_age>0, iter(1)
 predict lev, hat
    drop q
    drop wt
    rename lev q
 }
}

The flaw in this is the predict command. I don't think this is allowed in svy. Can anyone give me any ideas on a work around? I'll change this to domain analysis once that's fixed.

Announcement

Bias corrected logistic regression

Comment

Comment

Comment

Comment