Comparing proportions across multiple groups after -svy- command

Daniel Sun

Join Date: Sep 2022
Posts: 4

Comparing proportions across multiple groups after -svy- command

18 Sep 2022, 00:47

Hello all.

I'm hoping to get some help with making sense of the differing p-values in the following code:

Code:

. svy, subpop (high_risk if yearcat >2): proportion statin, over (yearcat)
(running proportion on estimation sample)

Survey: Proportion estimation

Number of strata =      90      Number of obs   =       59,842
Number of PSUs   =     184      Population size =  185,419,848
                                Subpop. no. obs =        1,849
                                Subpop. size    = 5,041,315.44
                                Design df       =           94

           No: statin = No
          Yes: statin = Yes

    _subpop_1: yearcat = 2007-2010
    _subpop_2: yearcat = 2011-2014
    _subpop_3: yearcat = 2015-2018

--------------------------------------------------------------
             |             Linearized            Logit
        Over | Proportion   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
No           |
   _subpop_1 |   .6055593   .0260497      .5528378    .6559334
   _subpop_2 |   .5845526   .0340879      .5156929    .6502617
   _subpop_3 |   .5350799    .024259      .4867357    .5827739
-------------+------------------------------------------------
Yes          |
   _subpop_1 |   .3944407   .0260497      .3440666    .4471622
   _subpop_2 |   .4154474   .0340879      .3497383    .4843071
   _subpop_3 |   .4649201    .024259      .4172261    .5132643
--------------------------------------------------------------
Note: 58 strata omitted because they contain no subpopulation
      members.

. test [Yes]_subpop_1 = [Yes]_subpop_2 = [Yes]_subpop_3

Adjusted Wald test

 ( 1)  [Yes]_subpop_1 - [Yes]_subpop_2 = 0
 ( 2)  [Yes]_subpop_1 - [Yes]_subpop_3 = 0

       F(  2,    93) =    2.03
            Prob > F =    0.1367

. svy, subpop(high_risk if yearcat > 2): logistic statin  yearcat
(running logistic on estimation sample)

Survey: Logistic regression

Number of strata   =        90                Number of obs     =       59,842
Number of PSUs     =       184                Population size   =  185,419,848
                                              Subpop. no. obs   =        1,849
                                              Subpop. size      = 5,041,315.44
                                              Design df         =           94
                                              F(   1,     94)   =         3.97
                                              Prob > F          =       0.0492

------------------------------------------------------------------------------
             |             Linearized
      statin | Odds Ratio   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     yearcat |   1.157566   .0850123     1.99   0.049     1.000501    1.339287
       _cons |   .4115848    .126835    -2.88   0.005     .2232185    .7589069
------------------------------------------------------------------------------
Note: _cons estimates baseline odds.
Note: 58 strata omitted because they contain no subpopulation members.

Specifically, I don't understand how the p value generated with -test- is 0.1367, whereas the p value generated with -logistic- is 0.049.

Is this discrepancy inherent to proportions/logistic regression?

The p-values match when comparing means/using linear regression: Comparing means across multiple groups using svy commands - Statalist

Last edited by Daniel Sun; 18 Sep 2022, 00:59.

Tags: None

Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2457
#2

18 Sep 2022, 06:58

Here's a reproducible example.

Code:

use https://www.stata-press.com/data/r17/nhanes2f svyset psuid [pweight=finalwgt], strata(stratid) svy: prop female, over(race) test _b[[email protected]]=_b[[email protected]]=_b[[email protected]] svy: logit female i.race testparm i.race

In essence, your survey-weighted logistic regression is not estimating the same quantity as the proportion command. The regression uses year as a continuous predictor, rather than the intended categorical variable. As a consequence, the hypothesis test implied by the regression model is that the linear trend is flat. you can make the following changes to your code, in red.

Code:

svy, subpop(high_risk if yearcat > 2): logistic statin i.yearcat testparm i.yearcat
Comment
Daniel Sun

Join Date: Sep 2022

Posts: 4
#3

18 Sep 2022, 10:08

Ahhh makes sense. Appreciate your explanation and help Leonardo!
Comment

Announcement

Comparing proportions across multiple groups after -svy- command

Comment

Comment