Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Firth Logistic Regression

    I have a sample size of 19,178 variables.
    My response variable is binary and I have 155 predictors in total. After applying Rao-Scott Test for Independence (since my data is from a complex survey design), 77 variables were found significant and I took these significant variables as regressors for my firth logistic model.

    Is it possible that none of the estimates are significant? The prob chi of my model is 0.6257 which is very high. Do you know other statistical method alternatives (for modelling this kind of data) for my problem?

  • #2
    Why are you using firthlogit rather than logit?

    19,178 variables or 19,178 cases?

    77 variables is an awful lot. If there is a lot of collinearity between them they could all wind up insignificant. Have you considered a simpler model?
    -------------------------------------------
    Richard Williams, Notre Dame Dept of Sociology
    StataNow Version: 19.5 MP (2 processor)

    EMAIL: [email protected]
    WWW: https://www3.nd.edu/~rwilliam

    Comment


    • #3
      If you're using the user-written command firthlogit, it's not intended for use with data from a complex survey design.

      Comment


      • #4
        Originally posted by Richard Williams View Post
        Why are you using firthlogit rather than logit?

        19,178 variables or 19,178 cases?

        77 variables is an awful lot. If there is a lot of collinearity between them they could all wind up insignificant. Have you considered a simpler model?

        Im sorry, my mistake. I have 19,178 observations. as for the number of cases, there are 525 cases. Is it okay to use logit if there are only 525 cases? I was advised to use a regression method that utilizes penalized likelihood estimation because of the small percentage of cases in my data. What do you think is the best regression method to use in this kind of data?

        Comment


        • #5
          Originally posted by Joseph Coveney View Post
          If you're using the user-written command firthlogit, it's not intended for use with data from a complex survey design.
          Hello, thank you so much! Yes, i have been using the user-written command. Thank you for pointing that out. Do you have an idea of what command i should use for my case? Thank you very much!

          Comment


          • #6
            Do you mean 525 cases out of 19,178 experience the event? In general, that sounds like enough to run regular logit. See

            https://statisticalhorizons.com/logi...or-rare-events

            Or, do you mean that only 525 of the 19,178 are included in your analysis?

            Either way, 77 variables is a huge number of variables. Do you really need that many??? Consider a more parsimonious model.
            -------------------------------------------
            Richard Williams, Notre Dame Dept of Sociology
            StataNow Version: 19.5 MP (2 processor)

            EMAIL: [email protected]
            WWW: https://www3.nd.edu/~rwilliam

            Comment


            • #7
              Yes, I meant that 525 cases out of 19,178 experience the event. I have read the paper and it does make sense. I was also hesitant to use firth because I have a very large sample size. Nonetheless, your replies shed light to my problem. Thank you very much!

              The 77 variables are based on the test of independence from Rao-Scott chi-squared. I included all the variables that were significant but I will try and remove those with very low Cramer's V association to the response variable. Thank you!

              Comment


              • #8
                By significant, do you mean that their bivariate relationship with your dependent variable was significant? If so, that certainly doesn’t mean or imply that all 77 would be significant if all were included at once.

                Also it sounds like you are letting raw empiricism guide your variable choice. Is there any theory about what variables should be in the model?
                -------------------------------------------
                Richard Williams, Notre Dame Dept of Sociology
                StataNow Version: 19.5 MP (2 processor)

                EMAIL: [email protected]
                WWW: https://www3.nd.edu/~rwilliam

                Comment

                Working...
                X