Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • New on SSC: -wgtdistrim-

    Thanks to Kit Baum, a new command, wgtdistrim, by Sebastian Lang and myself, is now available from the SSC.

    wgtdistrim implements Potter's (1990) weight distribution approach to trim extreme sampling weights. The basic idea is that the sampling weights are assumed to follow a beta distribution. The parameters of the distribution are estimated from the moments of the observed sampling weights and the resulting quantiles are used as cut-off points for extreme sampling weights. The process is repeated a specified number of times (10 by default) or until no sampling weights are more extreme than the specified quantiles.

    Here is an example, trimming the top 1 percent of the sampling weight in nhanes2f

    Code:
    . webuse nhanes2f
    
    . wgtdistrim finalwgt , generate(double pw_t) upper(.01)
    Iteration 0:       min =      2000     max =     79634     rel. diff =         .
    Iteration 1:       min =  2011.591     max =  38739.22     rel. diff =  .5161157
    Iteration 2:       min =  2012.945     max =  37414.94     rel. diff =  .0341834
    Iteration 3:       min =  2013.067     max =  37316.73     rel. diff =  .0026248
    Iteration 4:       min =  2013.078     max =  37308.04     rel. diff =  .0002329
    Iteration 5:       min =  2013.079     max =  37307.28     rel. diff =  .0000206
    Iteration 6:       min =  2013.079     max =  37307.21     rel. diff =  1.82e-06
    Iteration 7:       min =  2013.079     max =   37307.2     rel. diff =  1.60e-07
    Iteration 8:       min =  2013.079     max =   37307.2     rel. diff =  1.41e-08
    Iteration 9:       min =  2013.079     max =   37307.2     rel. diff =  1.25e-09
    Iteration 10:      min =  2013.079     max =   37307.2     rel. diff =  1.10e-10
    
    . summarize finalwgt pw_t
    
        Variable |        Obs        Mean    Std. dev.       Min        Max
    -------------+---------------------------------------------------------
        finalwgt |     10,337    11320.85    7304.457       2000      79634
            pw_t |     10,337    11320.85    6999.602   2013.079    37307.2

    Stata 16.1 or newer is required.


    References:

    Potter, F. J. 1990. A study of procedures to identify and trim extreme sampling weights. Proceedings of the Survey Research Methods Section of the American Statistical Association, 225--230.
    http://www.asasrms.org/Proceedings/papers/1990_034.pdf

  • #2
    Dear Daniel,
    Your second reference, mentioned in the help file, Chen, Q., Elliott, M. R., Haziza, D., Yang, Y., Ghosh, M., Little, R. J. A., Sedransk, J. & Thompson, M. (2017). Approaches to improving survey-weighted estimates. https://doi.org/10.1214/17-STS609. is freely available here.
    http://publicationslist.org/eric.melse

    Comment


    • #3
      Dear Daniel,

      Just jumping in this new water, using Stata's help file example, like:
      Code:
      svyset psuid [pw=finalwgt], strata(stratid)
      svy: ologit health female black age c.age#c.age
      est store Y1
      which results in:
      Code:
      Number of strata = 31                            Number of obs   =      10,335
      Number of PSUs   = 62                            Population size = 116,997,257
                                                       Design df       =          31
                                                       F(4, 28)        =      223.27
                                                       Prob > F        =      0.0000
      
      ------------------------------------------------------------------------------
                   |             Linearized
            health | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
      -------------+----------------------------------------------------------------
            female |  -.1615219   .0523678    -3.08   0.004    -.2683267    -.054717
             black |   -.986568   .0790277   -12.48   0.000    -1.147746   -.8253899
               age |  -.0119491   .0082974    -1.44   0.160    -.0288717    .0049736
                   |
       c.age#c.age |  -.0003234    .000091    -3.55   0.001     -.000509   -.0001377
      -------------+----------------------------------------------------------------
             /cut1 |  -4.566229   .1632561                     -4.899192   -4.233266
             /cut2 |  -3.057415   .1699944                     -3.404121   -2.710709
             /cut3 |  -1.520596   .1714342                     -1.870239   -1.170954
             /cut4 |   -.242785   .1703965                      -.590311     .104741
      ------------------------------------------------------------------------------
      and next:
      Code:
      svyset psuid [pw=pw_t], strata(stratid)
      svy: ologit health female black age c.age#c.age
      est store Y2
      which results in:
      Code:
      Number of strata = 31                            Number of obs   =      10,335
      Number of PSUs   = 62                            Population size = 116,997,084
                                                       Design df       =          31
                                                       F(4, 28)        =      223.29
                                                       Prob > F        =      0.0000
      
      ------------------------------------------------------------------------------
                   |             Linearized
            health | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
      -------------+----------------------------------------------------------------
            female |  -.1640712   .0510219    -3.22   0.003    -.2681311   -.0600114
             black |  -.9844041   .0792971   -12.41   0.000    -1.146132   -.8226766
               age |   -.012558   .0083768    -1.50   0.144    -.0296427    .0045267
                   |
       c.age#c.age |  -.0003162   .0000918    -3.44   0.002    -.0005035   -.0001289
      -------------+----------------------------------------------------------------
             /cut1 |  -4.573808   .1618973                        -4.904   -4.243616
             /cut2 |  -3.065884   .1698778                     -3.412352   -2.719416
             /cut3 |   -1.53332   .1721877                     -1.884499   -1.182141
             /cut4 |  -.2533584   .1709936                     -.6021023    .0953854
      ------------------------------------------------------------------------------
      but, now using suest to compare these two models, like:
      Code:
      suest Y1 Y2
      results in the error message: weighting expression differs between models

      Which brings me to my question: what do you recommend to determine when trimming extreme sampling weights is a sensible method to apply (either on statistical or substantive grounds)?
      I do notice some improvement of the P-values using the trimmed weights, but, should we care?
      http://publicationslist.org/eric.melse

      Comment


      • #4
        I am not at all an expert on (sampling) weights. Here is how I understand the problem: The concern with extreme sampling weights (be that "large" weights or large weighted values) is that they jeopardize precision and lead to unstable estimators. The goal of trimming extreme weights is then to balance the introduced bias against the gained precision. The references in the help file discuss the topic in some detail but do not seem to derive more general recommendations; I can't do that, either.*

        * Edit/added: By more general recommendations I mean a "one size fits all" type of advice.
        Last edited by daniel klein; 17 Nov 2023, 04:59.

        Comment


        • #5
          I totally agree with daniel klein, its hard to give any general advice on this topic.
          Regarding your error (altough I am not that familiar with suest): suest seems to check for any changes regarding the the weighting command, in this case the difference causing the error seems to be the different variable. Although I'm not recommending this, as I am not completely sure about the consequences, you can compare the results using suest for example like this:
          Code:
          gen wgt = finalwgt
          svyset psuid [pw=wgt], strata(stratid)
          svy: ologit health female black age c.age#c.age
          est store Y1
          
          wgtdistrim finalwgt , generate(double pw_t) upper(.01)
          replace wgt = pw_t
          
          svy: ologit health female black age c.age#c.age
          est store Y2
          suest Y1 Y2
          Resulting in this output (as to be expected there are some changes in the point estimates und smaller standard errors using the trimmed weights):
          Code:
          Simultaneous survey results for Y1, Y2
          
          Number of strata = 31                            Number of obs   =      10,335
          Number of PSUs   = 62                            Population size = 116,997,084
                                                           Design df       =          31
          
          ------------------------------------------------------------------------------
                       |             Linearized
                       | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
          -------------+----------------------------------------------------------------
          Y1_health    |
                female |  -.1615219   .0510451    -3.16   0.003     -.265629   -.0574148
                 black |   -.986568   .0793733   -12.43   0.000    -1.148451    -.824685
                   age |  -.0119491   .0083755    -1.43   0.164     -.029031    .0051329
                       |
           c.age#c.age |  -.0003234   .0000919    -3.52   0.001    -.0005107    -.000136
          -------------+----------------------------------------------------------------
          /Y1          |
                  cut1 |  -4.566229   .1620661                     -4.896765   -4.235693
                  cut2 |  -3.057415   .1700066                     -3.404146   -2.710684
                  cut3 |  -1.520596   .1722947                     -1.871994   -1.169199
                  cut4 |   -.242785   .1711091                     -.5917643    .1061943
          -------------+----------------------------------------------------------------
          Y2_health    |
                female |  -.1640712   .0510219    -3.22   0.003    -.2681311   -.0600114
                 black |  -.9844041   .0792971   -12.41   0.000    -1.146132   -.8226766
                   age |   -.012558   .0083768    -1.50   0.144    -.0296427    .0045267
                       |
           c.age#c.age |  -.0003162   .0000918    -3.44   0.002    -.0005035   -.0001289
          -------------+----------------------------------------------------------------
          /Y2          |
                  cut1 |  -4.573808   .1618973                        -4.904   -4.243616
                  cut2 |  -3.065884   .1698778                     -3.412352   -2.719416
                  cut3 |   -1.53332   .1721877                     -1.884499   -1.182141
                  cut4 |  -.2533585   .1709936                     -.6021023    .0953854
          ------------------------------------------------------------------------------

          Comment

          Working...
          X