Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • glm w/ family(binomial) link(logit)

    I have proportion data. Each observation includes variables, "numerator" (count of successes), "denominator (total count), proportion (numerator/denominator), and predictors, v1, v2, etc...).

    Question 1: Is glm proportion v1 v2, family(binomial) link(logit) a reasonable specification?
    Question 2: When would glm numerator v1 v2, family(binomial denominator) link(logit) be a better specification, or what is the main difference between the two?

  • #2
    Posts by Clyde Schechter (#15956) and Nick Cox (#18416) on the same thread address the underlying question.
    The first model, glm proportion v1 v2, family(binomial) link(logit) appears to be equivalent to fracreg, at least for some data at hand.
    Any more light to shed on which of the two models might be preferable?

    Comment


    • #3
      I am not sure what other thread you are referring to; but yes, fracreg can be used, and can also estimate heteroskedastic probit models if that interests you.

      As for your other question, my first guess is that the two approaches would give the same results. But, maybe not. Why don't you try them both on the same problem and see what happens?
      -------------------------------------------
      Richard Williams, Notre Dame Dept of Sociology
      Stata Version: 17.0 MP (2 processor)

      EMAIL: [email protected]
      WWW: https://www3.nd.edu/~rwilliam

      Comment


      • #4
        Thank you. The other thread is "GLM and blogit for proportion variable: different results".

        The two approaches give quite different results.

        Comment


        • #5
          Thanks. Giving links helps; I assume you mean

          https://www.statalist.org/forums/for...ferent-results

          As Clyde notes in that thread, something weird was happening in those examples and the analysis may not have been done correctly. You should be able to quickly try the two approaches yourself with your own data. If the results are identical it doesn't matter which you use. If they differ, then you can try to determine which is best. I would try it myself but I don't have appropriate data handy.
          -------------------------------------------
          Richard Williams, Notre Dame Dept of Sociology
          Stata Version: 17.0 MP (2 processor)

          EMAIL: [email protected]
          WWW: https://www3.nd.edu/~rwilliam

          Comment


          • #6
            Thank you, again. In my brevity, I meant that I had used the two approaches, and the results are quite different. Reading since, fracreg appears to be more suited where the denominator is not known. However, the reason for the large difference between the two is puzzling.

            Comment


            • #7
              Retry your glm results specifying vce(robust). That is what fracreg does and it is what I think you are supposed to do. In the following I got big differences in standard errors (but not coefficients) if I did not use vce(robust) but trivial differences when I did.

              Code:
              webuse xmpl2, clear
              gen prop = deaths/pop
              glm prop agecat exposed, family(binomial) nolog vce(robust)
              glm deaths agecat exposed, family(binomial pop) nolog vce(robust)
              fracreg logit prop agecat exposed, nolog
              You do have me curious though. There are an infinite number of ways of getting a proportion of .25. Is there any added advantage to knowing how I got it, e.g. 10/40. 16/64, 100/400, or whatever?
              -------------------------------------------
              Richard Williams, Notre Dame Dept of Sociology
              Stata Version: 17.0 MP (2 processor)

              EMAIL: [email protected]
              WWW: https://www3.nd.edu/~rwilliam

              Comment


              • #8
                Originally posted by Richard Williams View Post

                You do have me curious though. There are an infinite number of ways of getting a proportion of .25. Is there any added advantage to knowing how I got it, e.g. 10/40. 16/64, 100/400, or whatever?
                I'm pretty sure it doesn't matter in frequentist estimation, no. However, for Bayesians, it may. Imagine you had 3 deaths in a sample of 10. That's a proportion of 0.3. However, the sample was 10, so we're not that certain if the actual proportion is 0.3. That data is described by the PDF of a beta distribution with alpha = 3, beta = 7 (mean is alpha / (alpha + beta)).



                If you had 300 deaths in a sample of 1,000, we're more confident that the probability is 0.3, and the corresponding beta distribution is:



                Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

                When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

                Comment

                Working...
                X