Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Comparing proportions across multiple groups after -svy- command

    Hello all.

    I'm hoping to get some help with making sense of the differing p-values in the following code:

    Code:
    . svy, subpop (high_risk if yearcat >2): proportion statin, over (yearcat)
    (running proportion on estimation sample)
    
    Survey: Proportion estimation
    
    Number of strata =      90      Number of obs   =       59,842
    Number of PSUs   =     184      Population size =  185,419,848
                                    Subpop. no. obs =        1,849
                                    Subpop. size    = 5,041,315.44
                                    Design df       =           94
    
               No: statin = No
              Yes: statin = Yes
    
        _subpop_1: yearcat = 2007-2010
        _subpop_2: yearcat = 2011-2014
        _subpop_3: yearcat = 2015-2018
    
    --------------------------------------------------------------
                 |             Linearized            Logit
            Over | Proportion   Std. Err.     [95% Conf. Interval]
    -------------+------------------------------------------------
    No           |
       _subpop_1 |   .6055593   .0260497      .5528378    .6559334
       _subpop_2 |   .5845526   .0340879      .5156929    .6502617
       _subpop_3 |   .5350799    .024259      .4867357    .5827739
    -------------+------------------------------------------------
    Yes          |
       _subpop_1 |   .3944407   .0260497      .3440666    .4471622
       _subpop_2 |   .4154474   .0340879      .3497383    .4843071
       _subpop_3 |   .4649201    .024259      .4172261    .5132643
    --------------------------------------------------------------
    Note: 58 strata omitted because they contain no subpopulation
          members.
    
    . test [Yes]_subpop_1 = [Yes]_subpop_2 = [Yes]_subpop_3
    
    Adjusted Wald test
    
     ( 1)  [Yes]_subpop_1 - [Yes]_subpop_2 = 0
     ( 2)  [Yes]_subpop_1 - [Yes]_subpop_3 = 0
    
           F(  2,    93) =    2.03
                Prob > F =    0.1367
    
    . svy, subpop(high_risk if yearcat > 2): logistic statin  yearcat
    (running logistic on estimation sample)
    
    Survey: Logistic regression
    
    Number of strata   =        90                Number of obs     =       59,842
    Number of PSUs     =       184                Population size   =  185,419,848
                                                  Subpop. no. obs   =        1,849
                                                  Subpop. size      = 5,041,315.44
                                                  Design df         =           94
                                                  F(   1,     94)   =         3.97
                                                  Prob > F          =       0.0492
    
    ------------------------------------------------------------------------------
                 |             Linearized
          statin | Odds Ratio   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
         yearcat |   1.157566   .0850123     1.99   0.049     1.000501    1.339287
           _cons |   .4115848    .126835    -2.88   0.005     .2232185    .7589069
    ------------------------------------------------------------------------------
    Note: _cons estimates baseline odds.
    Note: 58 strata omitted because they contain no subpopulation members.
    Specifically, I don't understand how the p value generated with -test- is 0.1367, whereas the p value generated with -logistic- is 0.049.

    Is this discrepancy inherent to proportions/logistic regression?

    The p-values match when comparing means/using linear regression: Comparing means across multiple groups using svy commands - Statalist
    Last edited by Daniel Sun; 18 Sep 2022, 00:59.

  • #2
    Here's a reproducible example.

    Code:
    use https://www.stata-press.com/data/r17/nhanes2f
    svyset psuid [pweight=finalwgt], strata(stratid)
    
    svy: prop female, over(race)
    test _b[[email protected]]=_b[[email protected]]=_b[[email protected]]
    
    svy: logit female i.race
    testparm i.race
    In essence, your survey-weighted logistic regression is not estimating the same quantity as the proportion command. The regression uses year as a continuous predictor, rather than the intended categorical variable. As a consequence, the hypothesis test implied by the regression model is that the linear trend is flat. you can make the following changes to your code, in red.

    Code:
    svy, subpop(high_risk if yearcat > 2): logistic statin  i.yearcat
    testparm i.yearcat

    Comment


    • #3
      Ahhh makes sense. Appreciate your explanation and help Leonardo!

      Comment

      Working...
      X