glm w/ family(binomial) link(logit)

Hillel Alpert

Join Date: Jul 2014

Posts: 49
#1

glm w/ family(binomial) link(logit)

25 Jul 2019, 09:31

I have proportion data. Each observation includes variables, "numerator" (count of successes), "denominator (total count), proportion (numerator/denominator), and predictors, v1, v2, etc...).

Question 1: Is glm proportion v1 v2, family(binomial) link(logit) a reasonable specification?
Question 2: When would glm numerator v1 v2, family(binomial denominator) link(logit) be a better specification, or what is the main difference between the two?
Tags: None
Hillel Alpert

Join Date: Jul 2014

Posts: 49
#2

25 Jul 2019, 12:16

Posts by Clyde Schechter (#15956) and Nick Cox (#18416) on the same thread address the underlying question.
The first model, glm proportion v1 v2, family(binomial) link(logit) appears to be equivalent to fracreg, at least for some data at hand.
Any more light to shed on which of the two models might be preferable?
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4982
#3

25 Jul 2019, 12:28

I am not sure what other thread you are referring to; but yes, fracreg can be used, and can also estimate heteroskedastic probit models if that interests you.

As for your other question, my first guess is that the two approaches would give the same results. But, maybe not. Why don't you try them both on the same problem and see what happens?

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
Hillel Alpert

Join Date: Jul 2014

Posts: 49
#4

25 Jul 2019, 12:33

Thank you. The other thread is "GLM and blogit for proportion variable: different results".

The two approaches give quite different results.
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4982
#5

25 Jul 2019, 12:52

Thanks. Giving links helps; I assume you mean

https://www.statalist.org/forums/for...ferent-results

As Clyde notes in that thread, something weird was happening in those examples and the analysis may not have been done correctly. You should be able to quickly try the two approaches yourself with your own data. If the results are identical it doesn't matter which you use. If they differ, then you can try to determine which is best. I would try it myself but I don't have appropriate data handy.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
Hillel Alpert

Join Date: Jul 2014

Posts: 49
#6

25 Jul 2019, 12:59

Thank you, again. In my brevity, I meant that I had used the two approaches, and the results are quite different. Reading since, fracreg appears to be more suited where the denominator is not known. However, the reason for the large difference between the two is puzzling.
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4982
#7

25 Jul 2019, 13:30

Retry your glm results specifying vce(robust). That is what fracreg does and it is what I think you are supposed to do. In the following I got big differences in standard errors (but not coefficients) if I did not use vce(robust) but trivial differences when I did.

Code:

webuse xmpl2, clear gen prop = deaths/pop glm prop agecat exposed, family(binomial) nolog vce(robust) glm deaths agecat exposed, family(binomial pop) nolog vce(robust) fracreg logit prop agecat exposed, nolog

You do have me curious though. There are an infinite number of ways of getting a proportion of .25. Is there any added advantage to knowing how I got it, e.g. 10/40. 16/64, 100/400, or whatever?

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
1 like
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#8

26 Jul 2019, 10:34

Originally posted by Richard Williams View Post

You do have me curious though. There are an infinite number of ways of getting a proportion of .25. Is there any added advantage to knowing how I got it, e.g. 10/40. 16/64, 100/400, or whatever?

I'm pretty sure it doesn't matter in frequentist estimation, no. However, for Bayesians, it may. Imagine you had 3 deaths in a sample of 10. That's a proportion of 0.3. However, the sample was 10, so we're not that certain if the actual proportion is 0.3. That data is described by the PDF of a beta distribution with alpha = 3, beta = 7 (mean is alpha / (alpha + beta)).

If you had 300 deaths in a sample of 1,000, we're more confident that the probability is 0.3, and the corresponding beta distribution is:

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment

Announcement

glm w/ family(binomial) link(logit)

Comment

Comment

Comment

Comment

Comment

Comment

Comment