Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problems with Firth Logit

    Hello Stata Listers,

    I have a dataset with some collinearity and small sample bias so have opted for using firthlogit as opposed to a traditional logit model. However, when I run the same command using firthlogit instead of just logit, the model never converges. I have tried to run this on Stata 16 and Stata 18 and the model estimates are never produced (i.e., I let the machines run for several hours with no result). Is there something I'm doing wrong here? I've looked at the data and other model specifications work fine--it's just the firthlogit command that is causing an issue.

    Code:
     logit stateorder medicaid_expansion percapita_deaths ideology_diff prop_neighbors div_gov demgov, nolog
    HTML Code:
     note: div_gov != 0 predicts success perfectly
          div_gov dropped and 1107132 obs not used
    
    note: demgov != 0 predicts success perfectly
          demgov dropped and 1277460 obs not used
    
    
    Logistic regression                             Number of obs     =  1,873,608
                                                    LR chi2(4)        =  218763.82
                                                    Prob > chi2       =     0.0000
    Log likelihood = -1189304.2                     Pseudo R2         =     0.0842
    
    ------------------------------------------------------------------------------------
            stateorder |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------------+----------------------------------------------------------------
    medicaid_expansion |   .7679144   .0021245   361.46   0.000     .7637505    .7720783
      percapita_deaths |   430.3479   2.385946   180.37   0.000     425.6715    435.0242
         ideology_diff |   .0130708   .0001466    89.17   0.000     .0127835    .0133581
        prop_neighbors |  -2.810622   .0084456  -332.79   0.000    -2.827175   -2.794069
               div_gov |          0  (omitted)
                demgov |          0  (omitted)
                 _cons |   .8516809   .0064683   131.67   0.000     .8390034    .8643585
    ------------------------------------------------------------------------------------
    Code:
    firthlogit stateorder medicaid_expansion percapita_deaths ideology_diff prop_neighbors div_gov##demgov, nolog
    (I have had to break this several times because nothing happens in the Results area of Stata.)
    Last edited by Davia Downey; 01 Oct 2023, 10:49. Reason: added tags

  • #2
    Woah, why are div_gov and demgov omitted in the logit model? Are they perfectly collinear?

    Comment


    • #3
      Yes, there's a small portion of the data that is categorized this way which is why I'm using Firth.

      Comment


      • #4
        I am perplexed that you are concerned about small sample size when your estimation sample is nearly 2,000,000 observations. Do you mean, instead, that the stateorder outcome is very rare? If not, why do you feel the need to use penalized estimation?

        Next, the fact that something runs for a few hours and does not converge does not mean that it won't eventually. What are you seeing during those two hours? If the iterations show any progress at all, then you should let it continue to run. Only if the penalized log likelihood is not changing at all and giving "not concave" warnings, or if it is going around in circles can you conclude that all hope of convergence is lost. These are not simple models, and in large data sets they can take a long time to reach convergence. Days, even weeks may be needed.

        Comment


        • #5
          Not overall sample size, we are concerned about the very small sample of observations with just Divided Government and Democratic Governors which is a key piece of theory (I.e., these specific observations make up less than 2% of the overall obs). I do recognize that the model will take time, but I am simply re-running to check the specification and it's taking a long time. Guess I'll keep waiting.
          Last edited by Davia Downey; 01 Oct 2023, 11:10.

          Comment


          • #6
            Originally posted by Davia Downey View Post
            . . . I am simply re-running to check the specification and it's taking a long time. Guess I'll keep waiting.
            Take a random sample, say, 2%, which should give you a perfectly adequate answer to your specification questions in less than one minute.

            If you're worried about representativeness (stability of the estimates), then take another random 2% sample and see whether the estimates are sufficiently close between the two samples for your purposes.

            See below. (Begin at the "Begin here" comment; the top part is just to create a dataset for illustration that mimics yours in essential features of the problem and verifies it through reproducing your logistic regression.)

            .ÿ
            .ÿversionÿ18.0

            .ÿ
            .ÿclearÿ*

            .ÿ
            .ÿ//ÿseedem
            .ÿsetÿseedÿ2128452337

            .ÿ
            .ÿquietlyÿsetÿobsÿ`=1107132ÿ+ÿ1277460ÿ+ÿ1873608'

            .ÿforeachÿvarÿofÿnewlistÿmedicaid_expansionÿpercapita_deathsÿideology_diffÿ///
            >ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿprop_neighborsÿ{
            ÿÿ2.ÿÿÿÿÿÿÿÿÿgenerateÿdoubleÿ`var'ÿ=ÿruniform()
            ÿÿ3.ÿ}

            .ÿgenerateÿdoubleÿxbÿ=ÿ.8516809ÿ+ÿ.7679144ÿ*ÿmedicaid_expansionÿ+ÿ///
            >ÿÿÿÿÿÿÿÿÿ430.3479ÿ*ÿpercapita_deathsÿ+ÿ.0130708ÿ*ÿideology_diffÿ+ÿ///
            >ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ-2.810622ÿ*ÿprop_neighbors

            .ÿgenerateÿdoubleÿprbÿ=ÿinvlogit(xb)

            .ÿquietlyÿreplaceÿprbÿ=ÿcond(prbÿ<ÿ1e-8,ÿ1e-8,ÿcond(prbÿ>ÿ1-1e-8,ÿ1-1e-8,ÿprb))

            .ÿgenerateÿbyteÿstateorderÿ=ÿrbinomial(1,ÿprb)

            .ÿtabulateÿstateorder

            ÿstateorderÿ|ÿÿÿÿÿÿFreq.ÿÿÿÿÿPercentÿÿÿÿÿÿÿÿCum.
            ------------+-----------------------------------
            ÿÿÿÿÿÿÿÿÿÿ0ÿ|ÿÿÿÿÿÿ8,375ÿÿÿÿÿÿÿÿ0.20ÿÿÿÿÿÿÿÿ0.20
            ÿÿÿÿÿÿÿÿÿÿ1ÿ|ÿÿ4,249,825ÿÿÿÿÿÿÿ99.80ÿÿÿÿÿÿ100.00
            ------------+-----------------------------------
            ÿÿÿÿÿÿTotalÿ|ÿÿ4,258,200ÿÿÿÿÿÿ100.00

            .ÿ
            .ÿgenerateÿdoubleÿranduÿ=ÿruniform()

            .ÿisidÿrandu

            .ÿgsortÿ-stateorderÿ+randu

            .ÿgenerateÿbyteÿdiv_govÿ=ÿ_nÿ<=ÿ1107132

            .ÿgsortÿ+div_govÿ-stateorderÿ+randu

            .ÿgenerateÿbyteÿdemgovÿ=ÿ_nÿ<=ÿ1277460

            .ÿdropÿrandu

            .ÿ
            .ÿlogitÿstateorderÿc.(medicaid_expansionÿpercapita_deathsÿideology_diffÿ///
            >ÿÿÿÿÿÿÿÿÿprop_neighbors)ÿi.div_govÿi.demgov,ÿnolog
            note:ÿ0.div_govÿ!=ÿ1ÿpredictsÿsuccessÿperfectly;
            ÿÿÿÿÿÿ0.div_govÿomittedÿandÿ1107132ÿobsÿnotÿused.

            note:ÿ0.demgovÿ!=ÿ1ÿpredictsÿsuccessÿperfectly;
            ÿÿÿÿÿÿ0.demgovÿomittedÿandÿ1277460ÿobsÿnotÿused.

            note:ÿ1.div_govÿomittedÿbecauseÿofÿcollinearity.
            note:ÿ1.demgovÿomittedÿbecauseÿofÿcollinearity.

            LogisticÿregressionÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿobsÿ=ÿ1,873,608
            ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿLRÿchi2(4)ÿÿÿÿ=ÿÿ82945.51
            ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿProbÿ>ÿchi2ÿÿÿ=ÿÿÿÿ0.0000
            Logÿlikelihoodÿ=ÿ-12195.349ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿPseudoÿR2ÿÿÿÿÿ=ÿÿÿÿ0.7728

            ------------------------------------------------------------------------------------
            ÿÿÿÿÿÿÿÿstateorderÿ|ÿCoefficientÿÿStd.ÿerr.ÿÿÿÿÿÿzÿÿÿÿP>|z|ÿÿÿÿÿ[95%ÿconf.ÿinterval]
            -------------------+----------------------------------------------------------------
            medicaid_expansionÿ|ÿÿÿ.7675741ÿÿÿ.0559255ÿÿÿÿ13.72ÿÿÿ0.000ÿÿÿÿÿ.6579621ÿÿÿÿ.8771861
            ÿÿpercapita_deathsÿ|ÿÿÿ434.9579ÿÿÿ5.335517ÿÿÿÿ81.52ÿÿÿ0.000ÿÿÿÿÿ424.5005ÿÿÿÿ445.4153
            ÿÿÿÿÿideology_diffÿ|ÿÿ-.0734461ÿÿÿ.0552976ÿÿÿÿ-1.33ÿÿÿ0.184ÿÿÿÿ-.1818274ÿÿÿÿ.0349352
            ÿÿÿÿprop_neighborsÿ|ÿÿÿ-2.84587ÿÿÿ.0603477ÿÿÿ-47.16ÿÿÿ0.000ÿÿÿÿ-2.964149ÿÿÿÿ-2.72759
            ÿÿÿÿÿÿÿÿÿ1.div_govÿ|ÿÿÿÿÿÿÿÿÿÿ0ÿÿ(empty)
            ÿÿÿÿÿÿÿÿÿÿ1.demgovÿ|ÿÿÿÿÿÿÿÿÿÿ0ÿÿ(empty)
            ÿÿÿÿÿÿÿÿÿÿÿÿÿ_consÿ|ÿÿÿ.0885993ÿÿÿ.0547937ÿÿÿÿÿ1.62ÿÿÿ0.106ÿÿÿÿ-.0187945ÿÿÿÿÿ.195993
            ------------------------------------------------------------------------------------
            Note:ÿ0ÿfailuresÿandÿ1787579ÿsuccessesÿcompletelyÿdetermined.

            .ÿ
            .ÿ*
            .ÿ*ÿBeginÿhere
            .ÿ*
            .ÿ
            .ÿ//ÿFirstÿ2%ÿsample
            .ÿtimerÿclearÿ1

            .ÿtimerÿonÿ1

            .ÿgenerateÿdoubleÿranduÿ=ÿruniform()

            .ÿisidÿrandu

            .ÿgenerateÿbyteÿtouseÿ=ÿranduÿ<=ÿ0.02

            .ÿfirthlogitÿstateorderÿc.(medicaid_expansionÿpercapita_deathsÿideology_diffÿ///
            >ÿÿÿÿÿÿÿÿÿprop_neighbors)ÿi.(div_govÿdemgov)ÿifÿtouse,ÿnolog

            ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿobsÿ=ÿ85,112
            ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿWaldÿchi2(6)ÿÿ=ÿ173.51
            Penalizedÿlogÿlikelihoodÿ=ÿ-248.91756ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿProbÿ>ÿchi2ÿÿÿ=ÿ0.0000

            ------------------------------------------------------------------------------------
            ÿÿÿÿÿÿÿÿstateorderÿ|ÿCoefficientÿÿStd.ÿerr.ÿÿÿÿÿÿzÿÿÿÿP>|z|ÿÿÿÿÿ[95%ÿconf.ÿinterval]
            -------------------+----------------------------------------------------------------
            medicaid_expansionÿ|ÿÿÿ.7500148ÿÿÿ.3866146ÿÿÿÿÿ1.94ÿÿÿ0.052ÿÿÿÿ-.0077358ÿÿÿÿ1.507765
            ÿÿpercapita_deathsÿ|ÿÿÿ448.0953ÿÿÿ38.34081ÿÿÿÿ11.69ÿÿÿ0.000ÿÿÿÿÿ372.9487ÿÿÿÿ523.2419
            ÿÿÿÿÿideology_diffÿ|ÿÿ-.1715026ÿÿÿ.3773157ÿÿÿÿ-0.45ÿÿÿ0.649ÿÿÿÿ-.9110278ÿÿÿÿ.5680226
            ÿÿÿÿprop_neighborsÿ|ÿÿ-2.755213ÿÿÿ.4016178ÿÿÿÿ-6.86ÿÿÿ0.000ÿÿÿÿ-3.542369ÿÿÿ-1.968056
            ÿÿÿÿÿÿÿÿÿ1.div_govÿ|ÿÿÿ5.335676ÿÿÿ1.426213ÿÿÿÿÿ3.74ÿÿÿ0.000ÿÿÿÿÿÿ2.54035ÿÿÿÿ8.131002
            ÿÿÿÿÿÿÿÿÿÿ1.demgovÿ|ÿÿÿ5.568262ÿÿÿ1.428543ÿÿÿÿÿ3.90ÿÿÿ0.000ÿÿÿÿÿ2.768369ÿÿÿÿ8.368155
            ÿÿÿÿÿÿÿÿÿÿÿÿÿ_consÿ|ÿÿ-.1324304ÿÿÿ.3792277ÿÿÿÿ-0.35ÿÿÿ0.727ÿÿÿÿÿ-.875703ÿÿÿÿ.6108422
            ------------------------------------------------------------------------------------

            .ÿtimerÿoffÿ1

            .ÿ
            .ÿ//ÿSecondÿ2%ÿsample
            .ÿtimerÿclearÿ2

            .ÿtimerÿonÿ2

            .ÿquietlyÿreplaceÿranduÿ=ÿruniform()

            .ÿisidÿrandu

            .ÿquietlyÿreplaceÿtouseÿ=ÿranduÿ<=ÿ0.02

            .ÿfirthlogitÿstateorderÿc.(medicaid_expansionÿpercapita_deathsÿideology_diffÿ///
            >ÿÿÿÿÿÿÿÿÿprop_neighbors)ÿi.(div_govÿdemgov)ÿifÿtouse,ÿnolog

            ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿobsÿ=ÿ85,425
            ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿWaldÿchi2(6)ÿÿ=ÿ154.07
            Penalizedÿlogÿlikelihoodÿ=ÿ-212.21173ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿProbÿ>ÿchi2ÿÿÿ=ÿ0.0000

            ------------------------------------------------------------------------------------
            ÿÿÿÿÿÿÿÿstateorderÿ|ÿCoefficientÿÿStd.ÿerr.ÿÿÿÿÿÿzÿÿÿÿP>|z|ÿÿÿÿÿ[95%ÿconf.ÿinterval]
            -------------------+----------------------------------------------------------------
            medicaid_expansionÿ|ÿÿÿ.3827062ÿÿÿ.4088818ÿÿÿÿÿ0.94ÿÿÿ0.349ÿÿÿÿ-.4186875ÿÿÿÿÿÿ1.1841
            ÿÿpercapita_deathsÿ|ÿÿÿ479.6745ÿÿÿ43.12864ÿÿÿÿ11.12ÿÿÿ0.000ÿÿÿÿÿ395.1439ÿÿÿÿ564.2051
            ÿÿÿÿÿideology_diffÿ|ÿÿÿ.4434985ÿÿÿ.4177734ÿÿÿÿÿ1.06ÿÿÿ0.288ÿÿÿÿ-.3753224ÿÿÿÿ1.262319
            ÿÿÿÿprop_neighborsÿ|ÿÿ-2.995258ÿÿÿ.4545999ÿÿÿÿ-6.59ÿÿÿ0.000ÿÿÿÿ-3.886257ÿÿÿ-2.104259
            ÿÿÿÿÿÿÿÿÿ1.div_govÿ|ÿÿÿ5.435063ÿÿÿ1.429734ÿÿÿÿÿ3.80ÿÿÿ0.000ÿÿÿÿÿ2.632836ÿÿÿÿ8.237289
            ÿÿÿÿÿÿÿÿÿÿ1.demgovÿ|ÿÿÿ5.373212ÿÿÿ1.429395ÿÿÿÿÿ3.76ÿÿÿ0.000ÿÿÿÿÿÿ2.57165ÿÿÿÿ8.174775
            ÿÿÿÿÿÿÿÿÿÿÿÿÿ_consÿ|ÿÿ-.1177813ÿÿÿ.4004574ÿÿÿÿ-0.29ÿÿÿ0.769ÿÿÿÿ-.9026633ÿÿÿÿ.6671007
            ------------------------------------------------------------------------------------

            .ÿtimerÿoffÿ2

            .ÿ
            .ÿtimerÿlist
            ÿÿÿ1:ÿÿÿÿÿ50.51ÿ/ÿÿÿÿÿÿÿÿ1ÿ=ÿÿÿÿÿÿ50.5090
            ÿÿÿ2:ÿÿÿÿÿ48.00ÿ/ÿÿÿÿÿÿÿÿ1ÿ=ÿÿÿÿÿÿ47.9980

            .ÿ
            .ÿexit

            endÿofÿdo-file


            .

            Comment


            • #7
              This has been resolved. I had a corrupted copy of the data, but found the original.

              Comment

              Working...
              X