proportion as a dependent variable

Caroline Wilson

Join Date: Jun 2014

Posts: 35
#1

proportion as a dependent variable

19 Jun 2014, 13:00

Hi all,

I have data where the dependent variable is a proportion that was was created by taking the mean of a few other proportions. There are no values of exactly 0 and 28% of the values are exactly 1.

I have read about running a generalized linear model (glm) (from: http://www.ats.ucla.edu/stat/stata/faq/proportion.htm) & I have also read Baum's 2008 Stata journal article about modeling proportions.

I'm happy to run the glm model. However, I'm wondering whether there are any alternative models that I should also try with this type of data? (& that can also be implemented in Stata)?

Any thoughts would be much appreciated.

Carrie
Tags: None
Caroline Wilson

Join Date: Jun 2014

Posts: 35
#2

21 Jun 2014, 18:13

Hi, I just had a follow-up question to this one. Say if I have two variables: one that is a proportion and one that is nominal. is there a test for significance that I can run where I do not specify which variable is the dependent variable? Something like ANOVA, but when one of the variables is a proportion.
Thanks in advance for any thoughts!
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5025
#3

21 Jun 2014, 18:54

How many categories does the nominal variable have? If only 2, my initial impulse is to run pwcorr. Or even ANOVA or ttest. I suppose there might be some sort of violation of assumptions if you do that -- you could compare it to the p value you get from your glm model to see if it matters much. My experience is that T-tests seem to work fine even when assumptions are violated (e.g. the dependent variable is a dichotomy), at least when the sample is big, but you should do some double-checks to see if if seems to be true in your case.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://academicweb.nd.edu/~rwilliam/
Comment
Caroline Wilson

Join Date: Jun 2014

Posts: 35
#4

22 Jun 2014, 17:24

Thank you Richard. The nominal variable has 5 categories.
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5025
#5

22 Jun 2014, 17:35

Unless somebody has a better idea, I would probably use glm with my proportion as the dependent variable and the nominal variable as independent, and then report the F or chi-square value and/or the p-value for it. I'd be curious to see if the results were very different from what Anova gave you. Maybe somebody else knows how to do exactly what you originally asked, but I don't.

Then again, I am not sure I would do this at all. Why not just report p values in your full model rather than a bunch of bivariate p values?

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://academicweb.nd.edu/~rwilliam/
Comment
Caroline Wilson

Join Date: Jun 2014

Posts: 35
#6

22 Jun 2014, 19:21

Hi Richard, thanks a lot for taking the time to reply. I like the idea of reporting the overall value for glm. I think I was not originally clearly about my problem originally. Essentially, I'm thinking of summarizing the data as follows:

Group A 0.55
Group B 0.62
Group C 0.79
Group D 0.84
Group E 0.79

The raw data are proportions. So for example, .55 is the MEAN proportion across everyone in group A.

I would like to see whether there are overall statistically significant differences between the 5 groups, and then compare the different categories with each other, for example see if Group A is statistically significantly lower than Group E.
Comment

Richard Williams

Join Date: Apr 2014
Posts: 5025

22 Jun 2014, 20:15

That is easy enough to do. Adapting the earlier UCLA example,

Code:

use http://www.ats.ucla.edu/stat/stata/faq/proportion, clear
gen ed = round(parented)
glm meals i.ed , link(logit) family(binomial) robust nolog
testparm i.ed
pwcompare ed, pv

Look at the help for pwcompare and contrast, as there are different ways to do contrasts (e.g. you could do contrasts with the grand mean or adjacent categories), and you might want to do a bonferroni adjustment or something like that given that you are mass-producing contrasts.

Hopefully you have Stata 13, as I don't remember when the pwcompare and contrast commands were introduced. Here is the output in case your version of Stata does not support the abve commands:

Code:

. use http://www.ats.ucla.edu/stat/stata/faq/proportion, clear

. 
. gen ed = round(parented)
(164 missing values generated)

. 
. glm meals i.ed , link(logit) family(binomial) robust nolog
note: meals has noninteger values

Generalized linear models                          No. of obs      =      4257
Optimization     : ML                              Residual df     =      4252
                                                   Scale parameter =         1
Deviance         =  810.5198708                    (1/df) Deviance =  .1906209
Pearson          =  801.6671472                    (1/df) Pearson  =  .1885388

Variance function: V(u) = u*(1-u/1)                [Binomial]
Link function    : g(u) = ln(u/(1-u))              [Logit]

                                                   AIC             =  .8199845
Log pseudolikelihood = -1740.336979                BIC             = -34720.55

------------------------------------------------------------------------------
             |               Robust
       meals |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          ed |
          2  |  -.9727155    .155438    -6.26   0.000    -1.277368   -.6680626
          3  |  -2.571115    .154565   -16.63   0.000    -2.874056   -2.268173
          4  |  -4.228895   .1611694   -26.24   0.000    -4.544781   -3.913009
          5  |  -4.752488   .4585684   -10.36   0.000    -5.651266   -3.853711
             |
       _cons |    2.25915   .1532424    14.74   0.000     1.958801      2.5595
------------------------------------------------------------------------------

. 
. testparm i.ed

 ( 1)  [meals]2.ed = 0
 ( 2)  [meals]3.ed = 0
 ( 3)  [meals]4.ed = 0
 ( 4)  [meals]5.ed = 0

           chi2(  4) = 4447.14
         Prob > chi2 =    0.0000

. 
. pwcompare ed, pv

Pairwise comparisons of marginal linear predictions

Margins      : asbalanced

-----------------------------------------------------
             |                            Unadjusted
             |   Contrast   Std. Err.      z    P>|z|
-------------+---------------------------------------
meals        |
          ed |
     2 vs 1  |  -.9727155    .155438    -6.26   0.000
     3 vs 1  |  -2.571115    .154565   -16.63   0.000
     4 vs 1  |  -4.228895   .1611694   -26.24   0.000
     5 vs 1  |  -4.752488   .4585684   -10.36   0.000
     3 vs 2  |  -1.598399   .0329367   -48.53   0.000
     4 vs 2  |  -3.256179   .0563035   -57.83   0.000
     5 vs 2  |  -3.779773    .432989    -8.73   0.000
     4 vs 3  |   -1.65778   .0538464   -30.79   0.000
     5 vs 3  |  -2.181374   .4326764    -5.04   0.000
     5 vs 4  |  -.5235932   .4350794    -1.20   0.229
-----------------------------------------------------

If you are stuck with some horribly primitive version of Stata, the simplest thing might be to just keep rerunning the glm and changing the reference category each time. You will get all the pairwise contrasts that way. You could also use test commands, e.g.

test 2.ed = 3.ed

Last edited by Richard Williams; 22 Jun 2014, 20:22. Reason: Wrong code was posted before.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://academicweb.nd.edu/~rwilliam/

Comment

Caroline Wilson

Join Date: Jun 2014

Posts: 35
#8

22 Jun 2014, 22:04

Thank you so much, Richard! I will try this.
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5025
#9

24 Jun 2014, 09:05

Hope it works. Incidentally, Statalist etiquette is to use real names. I don't think we have ever banned somebody who refused to do so, but people who do use real names are probably more likely to get help. I also think there can be some professional benefit in that people can become more aware of you and your work. If you want to change your user id you can write to the forum administrators; or, you can keep your id but attach a signature, like I do with my messages.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://academicweb.nd.edu/~rwilliam/
Comment
Caroline Wilson

Join Date: Jun 2014

Posts: 35
#10

24 Jun 2014, 21:30

Thanks a lot, Richard. I've emailed the administrators to change my ID to my full name. Apologies for not knowing about that initially.
Comment

Announcement