Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Firthlogit and its extremely slow operation

    Hi all,

    I am trying to use the firthlogit command to estimate some models with plausible separation. The function works, however it's speed is less than satisfactory (to say the least). It takes long hours (even six or seven) to estimate a single model on a dataset containing around 50000 observations. Is it normal? How can I avoid the issue as it paralyses my research?

    Best regards,

  • #2
    It's not clear whether your model can be sped up, but the run-time may not be abnormal. I have often run models that can take 10+ hours to estimate (albeit with orders of magnitude more observations).

    -firthlogit- is from SSC. Also at SSC, you can try -penlogit- which estimates a different penalized logistic regression model or use a Bayesian model with the -bayes:- prefix to -logit-.

    Comment


    • #3
      If you provided some more data and context we could be of more use. It's hard to say what alternatives are possible knowing so little about the data and exact problems you're experiencing.

      Comment


      • #4
        One other thought that may or may not be applicable in your situation. If all of the variables in the model are categorical, you can -contract- your data on those variables and then run -firthlogit- on the contracted data set specifying -[fweight = _freq]-. That can speed things up a lot, but is only possible if all model variables are categorical. (See -help contract- if you are not familiar with that command.)

        Comment


        • #5
          Originally posted by Lukasz Dabros View Post
          It takes long hours (even six or seven) to estimate a single model on a dataset containing around 50000 observations. Is it normal?
          You can try some of the alternative estimation commands suggested above, but your problem might have something more to do with your model than with the command. There's nothing inherent in -firthlogit- that would make it take an inordinate amount of time to fit a model with fifty thousand data. The example below converged in under four minutes.

          .ÿ
          .ÿclearÿ*

          .ÿ
          .ÿsetÿseedÿ`=strreverse("1633696")'

          .ÿ
          .ÿquietlyÿsetÿobsÿ50000

          .ÿ
          .ÿgenerateÿdoubleÿcexÿ=ÿruniform(-0.5,ÿ0.5)

          .ÿgenerateÿintÿdexÿ=ÿmod(_n,ÿ10)

          .ÿ
          .ÿgenerateÿbyteÿoutÿ=ÿrbinomial(1,ÿ0.5)

          .ÿquietlyÿreplaceÿoutÿ=ÿ1ÿifÿdexÿ==ÿ9

          .ÿ
          .ÿtimerÿclearÿ1

          .ÿtimerÿonÿ1

          .ÿ
          .ÿfirthlogitÿoutÿi.dex##c.cex,ÿnolog

          ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿobsÿ=ÿ50,000
          ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿWaldÿchi2(19)ÿ=ÿÿ85.03
          Penalizedÿlogÿlikelihoodÿ=ÿ-31134.537ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿProbÿ>ÿchi2ÿÿÿ=ÿ0.0000

          ------------------------------------------------------------------------------
          ÿÿÿÿÿÿÿÿÿoutÿ|ÿCoefficientÿÿStd.ÿerr.ÿÿÿÿÿÿzÿÿÿÿP>|z|ÿÿÿÿÿ[95%ÿconf.ÿinterval]
          -------------+----------------------------------------------------------------
          ÿÿÿÿÿÿÿÿÿdexÿ|
          ÿÿÿÿÿÿÿÿÿÿ1ÿÿ|ÿÿÿ.0155561ÿÿÿÿ.039997ÿÿÿÿÿ0.39ÿÿÿ0.697ÿÿÿÿ-.0628365ÿÿÿÿ.0939487
          ÿÿÿÿÿÿÿÿÿÿ2ÿÿ|ÿÿ-.0087932ÿÿÿ.0400016ÿÿÿÿ-0.22ÿÿÿ0.826ÿÿÿÿ-.0871949ÿÿÿÿ.0696086
          ÿÿÿÿÿÿÿÿÿÿ3ÿÿ|ÿÿÿ.0056549ÿÿÿ.0399941ÿÿÿÿÿ0.14ÿÿÿ0.888ÿÿÿÿ-.0727322ÿÿÿÿ.0840419
          ÿÿÿÿÿÿÿÿÿÿ4ÿÿ|ÿÿ-.0177383ÿÿÿ.0399948ÿÿÿÿ-0.44ÿÿÿ0.657ÿÿÿÿ-.0961267ÿÿÿÿ.0606501
          ÿÿÿÿÿÿÿÿÿÿ5ÿÿ|ÿÿÿ.0158792ÿÿÿ.0399945ÿÿÿÿÿ0.40ÿÿÿ0.691ÿÿÿÿ-.0625086ÿÿÿÿ.0942669
          ÿÿÿÿÿÿÿÿÿÿ6ÿÿ|ÿÿÿ.0038951ÿÿÿ.0399966ÿÿÿÿÿ0.10ÿÿÿ0.922ÿÿÿÿ-.0744968ÿÿÿÿÿ.082287
          ÿÿÿÿÿÿÿÿÿÿ7ÿÿ|ÿÿ-.0392764ÿÿÿ.0399978ÿÿÿÿ-0.98ÿÿÿ0.326ÿÿÿÿ-.1176707ÿÿÿÿ.0391178
          ÿÿÿÿÿÿÿÿÿÿ8ÿÿ|ÿÿÿ.0264259ÿÿÿ.0399948ÿÿÿÿÿ0.66ÿÿÿ0.509ÿÿÿÿ-.0519623ÿÿÿÿ.1048142
          ÿÿÿÿÿÿÿÿÿÿ9ÿÿ|ÿÿÿ8.522399ÿÿÿ1.000641ÿÿÿÿÿ8.52ÿÿÿ0.000ÿÿÿÿÿ6.561179ÿÿÿÿ10.48362
          ÿÿÿÿÿÿÿÿÿÿÿÿÿ|
          ÿÿÿÿÿÿÿÿÿcexÿ|ÿÿ-.0343267ÿÿÿ.0981415ÿÿÿÿ-0.35ÿÿÿ0.727ÿÿÿÿ-.2266806ÿÿÿÿ.1580272
          ÿÿÿÿÿÿÿÿÿÿÿÿÿ|
          ÿÿÿdex#c.cexÿ|
          ÿÿÿÿÿÿÿÿÿÿ1ÿÿ|ÿÿ-.0801868ÿÿÿ.1389737ÿÿÿÿ-0.58ÿÿÿ0.564ÿÿÿÿ-.3525703ÿÿÿÿ.1921966
          ÿÿÿÿÿÿÿÿÿÿ2ÿÿ|ÿÿÿ-.140103ÿÿÿ.1383583ÿÿÿÿ-1.01ÿÿÿ0.311ÿÿÿÿ-.4112802ÿÿÿÿ.1310743
          ÿÿÿÿÿÿÿÿÿÿ3ÿÿ|ÿÿ-.0419668ÿÿÿ.1386974ÿÿÿÿ-0.30ÿÿÿ0.762ÿÿÿÿ-.3138087ÿÿÿÿÿ.229875
          ÿÿÿÿÿÿÿÿÿÿ4ÿÿ|ÿÿ-.0050961ÿÿÿ.1384141ÿÿÿÿ-0.04ÿÿÿ0.971ÿÿÿÿ-.2763828ÿÿÿÿ.2661906
          ÿÿÿÿÿÿÿÿÿÿ5ÿÿ|ÿÿ-.0494952ÿÿÿ.1381057ÿÿÿÿ-0.36ÿÿÿ0.720ÿÿÿÿ-.3201775ÿÿÿÿ.2211871
          ÿÿÿÿÿÿÿÿÿÿ6ÿÿ|ÿÿ-.1030694ÿÿÿ.1385923ÿÿÿÿ-0.74ÿÿÿ0.457ÿÿÿÿ-.3747052ÿÿÿÿ.1685665
          ÿÿÿÿÿÿÿÿÿÿ7ÿÿ|ÿÿÿ.0690978ÿÿÿ.1392863ÿÿÿÿÿ0.50ÿÿÿ0.620ÿÿÿÿ-.2038984ÿÿÿÿÿ.342094
          ÿÿÿÿÿÿÿÿÿÿ8ÿÿ|ÿÿ-.0094226ÿÿÿ.1391648ÿÿÿÿ-0.07ÿÿÿ0.946ÿÿÿÿ-.2821807ÿÿÿÿ.2633355
          ÿÿÿÿÿÿÿÿÿÿ9ÿÿ|ÿÿ-.0322188ÿÿÿÿ4.42528ÿÿÿÿ-0.01ÿÿÿ0.994ÿÿÿÿ-8.705609ÿÿÿÿ8.641171
          ÿÿÿÿÿÿÿÿÿÿÿÿÿ|
          ÿÿÿÿÿÿÿ_consÿ|ÿÿÿÿ-.00474ÿÿÿ.0282795ÿÿÿÿ-0.17ÿÿÿ0.867ÿÿÿÿ-.0601668ÿÿÿÿ.0506868
          ------------------------------------------------------------------------------

          .ÿ
          .ÿtimerÿoffÿ1

          .ÿ
          .ÿtimerÿlistÿ1
          ÿÿÿ1:ÿÿÿÿ228.24ÿ/ÿÿÿÿÿÿÿÿ1ÿ=ÿÿÿÿÿ228.2350

          .ÿ
          .ÿexit

          endÿofÿdo-file


          .


          You can also try simplifying your model or taking, say, a random 10% sample from the dataset and see whether that can still give you the answer to your research question.

          Comment

          Working...
          X