Firthlogit and its extremely slow operation

Lukasz Dabros

Join Date: Dec 2019

Posts: 3
#1

Firthlogit and its extremely slow operation

28 Oct 2021, 12:28

Hi all,

I am trying to use the firthlogit command to estimate some models with plausible separation. The function works, however it's speed is less than satisfactory (to say the least). It takes long hours (even six or seven) to estimate a single model on a dataset containing around 50000 observations. Is it normal? How can I avoid the issue as it paralyses my research?

Best regards,
Tags: None
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2396
#2

28 Oct 2021, 13:33

It's not clear whether your model can be sped up, but the run-time may not be abnormal. I have often run models that can take 10+ hours to estimate (albeit with orders of magnitude more observations).

-firthlogit- is from SSC. Also at SSC, you can try -penlogit- which estimates a different penalized logistic regression model or use a Bayesian model with the -bayes:- prefix to -logit-.
Comment
Jackson Monroe

Join Date: Jul 2019

Posts: 60
#3

28 Oct 2021, 13:39

If you provided some more data and context we could be of more use. It's hard to say what alternatives are possible knowing so little about the data and exact problems you're experiencing.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30003
#4

28 Oct 2021, 14:38

One other thought that may or may not be applicable in your situation. If all of the variables in the model are categorical, you can -contract- your data on those variables and then run -firthlogit- on the contracted data set specifying -[fweight = _freq]-. That can speed things up a lot, but is only possible if all model variables are categorical. (See -help contract- if you are not familiar with that command.)
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4384
#5

28 Oct 2021, 17:31

Originally posted by Lukasz Dabros View Post

It takes long hours (even six or seven) to estimate a single model on a dataset containing around 50000 observations. Is it normal?

You can try some of the alternative estimation commands suggested above, but your problem might have something more to do with your model than with the command. There's nothing inherent in -firthlogit- that would make it take an inordinate amount of time to fit a model with fifty thousand data. The example below converged in under four minutes.

.ÿ
.ÿclearÿ*

.ÿ
.ÿsetÿseedÿ`=strreverse("1633696")'

.ÿ
.ÿquietlyÿsetÿobsÿ50000

.ÿ
.ÿgenerateÿdoubleÿcexÿ=ÿruniform(-0.5,ÿ0.5)

.ÿgenerateÿintÿdexÿ=ÿmod(_n,ÿ10)

.ÿ
.ÿgenerateÿbyteÿoutÿ=ÿrbinomial(1,ÿ0.5)

.ÿquietlyÿreplaceÿoutÿ=ÿ1ÿifÿdexÿ==ÿ9

.ÿ
.ÿtimerÿclearÿ1

.ÿtimerÿonÿ1

.ÿ
.ÿfirthlogitÿoutÿi.dex##c.cex,ÿnolog

ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿobsÿ=ÿ50,000
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿWaldÿchi2(19)ÿ=ÿÿ85.03
Penalizedÿlogÿlikelihoodÿ=ÿ-31134.537ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿProbÿ>ÿchi2ÿÿÿ=ÿ0.0000

------------------------------------------------------------------------------
ÿÿÿÿÿÿÿÿÿoutÿ|ÿCoefficientÿÿStd.ÿerr.ÿÿÿÿÿÿzÿÿÿÿP>|z|ÿÿÿÿÿ[95%ÿconf.ÿinterval]
-------------+----------------------------------------------------------------
ÿÿÿÿÿÿÿÿÿdexÿ|
ÿÿÿÿÿÿÿÿÿÿ1ÿÿ|ÿÿÿ.0155561ÿÿÿÿ.039997ÿÿÿÿÿ0.39ÿÿÿ0.697ÿÿÿÿ-.0628365ÿÿÿÿ.0939487
ÿÿÿÿÿÿÿÿÿÿ2ÿÿ|ÿÿ-.0087932ÿÿÿ.0400016ÿÿÿÿ-0.22ÿÿÿ0.826ÿÿÿÿ-.0871949ÿÿÿÿ.0696086
ÿÿÿÿÿÿÿÿÿÿ3ÿÿ|ÿÿÿ.0056549ÿÿÿ.0399941ÿÿÿÿÿ0.14ÿÿÿ0.888ÿÿÿÿ-.0727322ÿÿÿÿ.0840419
ÿÿÿÿÿÿÿÿÿÿ4ÿÿ|ÿÿ-.0177383ÿÿÿ.0399948ÿÿÿÿ-0.44ÿÿÿ0.657ÿÿÿÿ-.0961267ÿÿÿÿ.0606501
ÿÿÿÿÿÿÿÿÿÿ5ÿÿ|ÿÿÿ.0158792ÿÿÿ.0399945ÿÿÿÿÿ0.40ÿÿÿ0.691ÿÿÿÿ-.0625086ÿÿÿÿ.0942669
ÿÿÿÿÿÿÿÿÿÿ6ÿÿ|ÿÿÿ.0038951ÿÿÿ.0399966ÿÿÿÿÿ0.10ÿÿÿ0.922ÿÿÿÿ-.0744968ÿÿÿÿÿ.082287
ÿÿÿÿÿÿÿÿÿÿ7ÿÿ|ÿÿ-.0392764ÿÿÿ.0399978ÿÿÿÿ-0.98ÿÿÿ0.326ÿÿÿÿ-.1176707ÿÿÿÿ.0391178
ÿÿÿÿÿÿÿÿÿÿ8ÿÿ|ÿÿÿ.0264259ÿÿÿ.0399948ÿÿÿÿÿ0.66ÿÿÿ0.509ÿÿÿÿ-.0519623ÿÿÿÿ.1048142
ÿÿÿÿÿÿÿÿÿÿ9ÿÿ|ÿÿÿ8.522399ÿÿÿ1.000641ÿÿÿÿÿ8.52ÿÿÿ0.000ÿÿÿÿÿ6.561179ÿÿÿÿ10.48362
ÿÿÿÿÿÿÿÿÿÿÿÿÿ|
ÿÿÿÿÿÿÿÿÿcexÿ|ÿÿ-.0343267ÿÿÿ.0981415ÿÿÿÿ-0.35ÿÿÿ0.727ÿÿÿÿ-.2266806ÿÿÿÿ.1580272
ÿÿÿÿÿÿÿÿÿÿÿÿÿ|
ÿÿÿdex#c.cexÿ|
ÿÿÿÿÿÿÿÿÿÿ1ÿÿ|ÿÿ-.0801868ÿÿÿ.1389737ÿÿÿÿ-0.58ÿÿÿ0.564ÿÿÿÿ-.3525703ÿÿÿÿ.1921966
ÿÿÿÿÿÿÿÿÿÿ2ÿÿ|ÿÿÿ-.140103ÿÿÿ.1383583ÿÿÿÿ-1.01ÿÿÿ0.311ÿÿÿÿ-.4112802ÿÿÿÿ.1310743
ÿÿÿÿÿÿÿÿÿÿ3ÿÿ|ÿÿ-.0419668ÿÿÿ.1386974ÿÿÿÿ-0.30ÿÿÿ0.762ÿÿÿÿ-.3138087ÿÿÿÿÿ.229875
ÿÿÿÿÿÿÿÿÿÿ4ÿÿ|ÿÿ-.0050961ÿÿÿ.1384141ÿÿÿÿ-0.04ÿÿÿ0.971ÿÿÿÿ-.2763828ÿÿÿÿ.2661906
ÿÿÿÿÿÿÿÿÿÿ5ÿÿ|ÿÿ-.0494952ÿÿÿ.1381057ÿÿÿÿ-0.36ÿÿÿ0.720ÿÿÿÿ-.3201775ÿÿÿÿ.2211871
ÿÿÿÿÿÿÿÿÿÿ6ÿÿ|ÿÿ-.1030694ÿÿÿ.1385923ÿÿÿÿ-0.74ÿÿÿ0.457ÿÿÿÿ-.3747052ÿÿÿÿ.1685665
ÿÿÿÿÿÿÿÿÿÿ7ÿÿ|ÿÿÿ.0690978ÿÿÿ.1392863ÿÿÿÿÿ0.50ÿÿÿ0.620ÿÿÿÿ-.2038984ÿÿÿÿÿ.342094
ÿÿÿÿÿÿÿÿÿÿ8ÿÿ|ÿÿ-.0094226ÿÿÿ.1391648ÿÿÿÿ-0.07ÿÿÿ0.946ÿÿÿÿ-.2821807ÿÿÿÿ.2633355
ÿÿÿÿÿÿÿÿÿÿ9ÿÿ|ÿÿ-.0322188ÿÿÿÿ4.42528ÿÿÿÿ-0.01ÿÿÿ0.994ÿÿÿÿ-8.705609ÿÿÿÿ8.641171
ÿÿÿÿÿÿÿÿÿÿÿÿÿ|
ÿÿÿÿÿÿÿ_consÿ|ÿÿÿÿ-.00474ÿÿÿ.0282795ÿÿÿÿ-0.17ÿÿÿ0.867ÿÿÿÿ-.0601668ÿÿÿÿ.0506868
------------------------------------------------------------------------------

.ÿ
.ÿtimerÿoffÿ1

.ÿ
.ÿtimerÿlistÿ1
ÿÿÿ1:ÿÿÿÿ228.24ÿ/ÿÿÿÿÿÿÿÿ1ÿ=ÿÿÿÿÿ228.2350

.ÿ
.ÿexit

endÿofÿdo-file

.

You can also try simplifying your model or taking, say, a random 10% sample from the dataset and see whether that can still give you the answer to your research question.
Comment

Announcement

Firthlogit and its extremely slow operation

Comment

Comment

Comment

Comment