Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • rare case regression analysis with binary independent variable

    Hello,
    I'm working on the question if 11-12 years old pupils feel disadvantaged by their teachers depending on their social class. I'm using Stata with G-SOEP data to find an empirical answer for my question by using a multivariate regression model. My dependent variable is binary and coded 0=no, 1=yes. Checking the univariate distribution of my dependent variable I found out that only 5% of the pupils in my sample (N=833) feel disadvantaged. So a simple "logit" command won't help because my dependant variables distribution is strongly skewed. Now I wonder what regression method is recommanded in that case. After some research I learned about rare case regressions (especially about "firthlogit") but my superior was not convinced by that. He told me to check regressions based on skewed link functions. I did some further research but found no accordingly regression method I can use to check my question mentioned above about the teacher-pupil-relation. May I ask you for your opinion, what regression method (Stata command) you would use in my case?

    best wishes,
    Harald
    Last edited by Harald UB; 12 Feb 2018, 08:52.

  • #2
    I don't think you have anything to worry about, although I can appreciate that you need to convince your superior. If you search through some earlier discussions on this list you will find, if I recall correctly, that the issue is the absolute number of observations with event = 1, not whether it is rare in percentage terms. N = 833 is by no means rare. That being said, note that besides logit, you could also look at -binreg- or -scobit- for alternative links. The community-contributed program -oglm- also allows some link functions I don't think are available in standard Stata.

    Comment


    • #3
      Originally posted by Mike Lacy View Post
      N = 833 is by no means rare.
      Thanks at lot for your reply. I probably was a bit unclear with my formulation. I'm sorry for that. It's about 5% from my sample size which is N=833. So it's only about 45 cases with value 1. The remaining cases are 0. I tried scobit previously with stata and I got the problem, that stata didn't finish calculating the model. I broke up after 30 minutes. oglm and binreg are new to me. I'm curious to test them.


      Comment


      • #4
        Ah, ok; yes, 45 events could be an issue. I have not previously read that a different link function will help with a low number of events, but perhaps someone else here does have some information about this. Perhaps your superior can point you to some relevant literature supporting that view, in which case, you might post it here.

        Comment


        • #5
          Firth has been suggested, as well as Gary King's rare event logit (https://gking.harvard.edu/scholar_so...sion/1-1-stata).

          good discussion here: https://www3.nd.edu/~rwilliam/stats3/RareEvents.pdf (by prolific Statalister Richard Williams).

          A more fundamental issue might be lack of power with a small number of events.
          __________________________________________________ __
          Assistant Professor, Department of Biostatistics and Epidemiology
          School of Public Health and Health Sciences
          University of Massachusetts- Amherst

          Comment


          • #6
            Thanks for your replys! From Allisons thoughts in that linked discussion I derive that firthlogit is the recommanded way to solve the problem with rare events data.
            I was given another paper (Chapter 7.1.2) which discusses quantile regression for binary dependent variables:
            https://lib.ugent.be/fulltxt/RUG01/0...14_0001_AC.pdf

            Have you ever heard of that? As far as I understand the method you estimate a quantile-based link function and use an asymmetric laplace distribution for your calculations.
            I found some sources which show estimating quantile regressions with Stata but not for binary variables (the case I'm interested in). Do you know about a stata ado for that? Can you recommand the method for my issue? I'm very grateful for any suggestions here.
            Last edited by Harald UB; 14 Feb 2018, 03:16.

            Comment


            • #7
              I did some more research and decided to use lqreg (logistic quantiles regression). It works quite fine so far. Anyway the stata command does not provide any modelfit.
              I wonder if anyone of you knows about a way to estimate the goodness of the model.

              Comment

              Working...
              X