New on SSC: -wgtdistrim-

daniel klein

Join Date: Mar 2014
Posts: 3876

New on SSC: -wgtdistrim-

17 Nov 2023, 02:38

Thanks to Kit Baum, a new command, wgtdistrim, by Sebastian Lang and myself, is now available from the SSC.

wgtdistrim implements Potter's (1990) weight distribution approach to trim extreme sampling weights. The basic idea is that the sampling weights are assumed to follow a beta distribution. The parameters of the distribution are estimated from the moments of the observed sampling weights and the resulting quantiles are used as cut-off points for extreme sampling weights. The process is repeated a specified number of times (10 by default) or until no sampling weights are more extreme than the specified quantiles.

Here is an example, trimming the top 1 percent of the sampling weight in nhanes2f

Code:

. webuse nhanes2f

. wgtdistrim finalwgt , generate(double pw_t) upper(.01)
Iteration 0:       min =      2000     max =     79634     rel. diff =         .
Iteration 1:       min =  2011.591     max =  38739.22     rel. diff =  .5161157
Iteration 2:       min =  2012.945     max =  37414.94     rel. diff =  .0341834
Iteration 3:       min =  2013.067     max =  37316.73     rel. diff =  .0026248
Iteration 4:       min =  2013.078     max =  37308.04     rel. diff =  .0002329
Iteration 5:       min =  2013.079     max =  37307.28     rel. diff =  .0000206
Iteration 6:       min =  2013.079     max =  37307.21     rel. diff =  1.82e-06
Iteration 7:       min =  2013.079     max =   37307.2     rel. diff =  1.60e-07
Iteration 8:       min =  2013.079     max =   37307.2     rel. diff =  1.41e-08
Iteration 9:       min =  2013.079     max =   37307.2     rel. diff =  1.25e-09
Iteration 10:      min =  2013.079     max =   37307.2     rel. diff =  1.10e-10

. summarize finalwgt pw_t

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
    finalwgt |     10,337    11320.85    7304.457       2000      79634
        pw_t |     10,337    11320.85    6999.602   2013.079    37307.2

Stata 16.1 or newer is required.

References:

Potter, F. J. 1990. A study of procedures to identify and trim extreme sampling weights. Proceedings of the Survey Research Methods Section of the American Statistical Association, 225--230.
http://www.asasrms.org/Proceedings/papers/1990_034.pdf

Tags: Potter 1990, ssc, survey weights, weight, weight trimming

ericmelse

Join Date: May 2014

Posts: 436
#2

17 Nov 2023, 03:12

Dear Daniel,
Your second reference, mentioned in the help file, Chen, Q., Elliott, M. R., Haziza, D., Yang, Y., Ghosh, M., Little, R. J. A., Sedransk, J. & Thompson, M. (2017). Approaches to improving survey-weighted estimates. https://doi.org/10.1214/17-STS609. is freely available here.

http://publicationslist.org/eric.melse
1 like
Comment

ericmelse

Join Date: May 2014
Posts: 436

17 Nov 2023, 03:30

Dear Daniel,

Just jumping in this new water, using Stata's help file example, like:

Code:

svyset psuid [pw=finalwgt], strata(stratid)
svy: ologit health female black age c.age#c.age
est store Y1

which results in:

Code:

Number of strata = 31                            Number of obs   =      10,335
Number of PSUs   = 62                            Population size = 116,997,257
                                                 Design df       =          31
                                                 F(4, 28)        =      223.27
                                                 Prob > F        =      0.0000

------------------------------------------------------------------------------
             |             Linearized
      health | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
      female |  -.1615219   .0523678    -3.08   0.004    -.2683267    -.054717
       black |   -.986568   .0790277   -12.48   0.000    -1.147746   -.8253899
         age |  -.0119491   .0082974    -1.44   0.160    -.0288717    .0049736
             |
 c.age#c.age |  -.0003234    .000091    -3.55   0.001     -.000509   -.0001377
-------------+----------------------------------------------------------------
       /cut1 |  -4.566229   .1632561                     -4.899192   -4.233266
       /cut2 |  -3.057415   .1699944                     -3.404121   -2.710709
       /cut3 |  -1.520596   .1714342                     -1.870239   -1.170954
       /cut4 |   -.242785   .1703965                      -.590311     .104741
------------------------------------------------------------------------------

and next:

Code:

svyset psuid [pw=pw_t], strata(stratid)
svy: ologit health female black age c.age#c.age
est store Y2

which results in:

Code:

Number of strata = 31                            Number of obs   =      10,335
Number of PSUs   = 62                            Population size = 116,997,084
                                                 Design df       =          31
                                                 F(4, 28)        =      223.29
                                                 Prob > F        =      0.0000

------------------------------------------------------------------------------
             |             Linearized
      health | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
      female |  -.1640712   .0510219    -3.22   0.003    -.2681311   -.0600114
       black |  -.9844041   .0792971   -12.41   0.000    -1.146132   -.8226766
         age |   -.012558   .0083768    -1.50   0.144    -.0296427    .0045267
             |
 c.age#c.age |  -.0003162   .0000918    -3.44   0.002    -.0005035   -.0001289
-------------+----------------------------------------------------------------
       /cut1 |  -4.573808   .1618973                        -4.904   -4.243616
       /cut2 |  -3.065884   .1698778                     -3.412352   -2.719416
       /cut3 |   -1.53332   .1721877                     -1.884499   -1.182141
       /cut4 |  -.2533584   .1709936                     -.6021023    .0953854
------------------------------------------------------------------------------

but, now using suest to compare these two models, like:

Code:

suest Y1 Y2

results in the error message: weighting expression differs between models

Which brings me to my question: what do you recommend to determine when trimming extreme sampling weights is a sensible method to apply (either on statistical or substantive grounds)?
I do notice some improvement of the P-values using the trimmed weights, but, should we care?

http://publicationslist.org/eric.melse

Comment

daniel klein

Join Date: Mar 2014

Posts: 3876
#4

17 Nov 2023, 04:43

I am not at all an expert on (sampling) weights. Here is how I understand the problem: The concern with extreme sampling weights (be that "large" weights or large weighted values) is that they jeopardize precision and lead to unstable estimators. The goal of trimming extreme weights is then to balance the introduced bias against the gained precision. The references in the help file discuss the topic in some detail but do not seem to derive more general recommendations; I can't do that, either.*

* Edit/added: By more general recommendations I mean a "one size fits all" type of advice.

Last edited by daniel klein; 17 Nov 2023, 04:59.
Comment

Sebastian Lang

Join Date: Nov 2023
Posts: 1

27 Nov 2023, 03:27

I totally agree with daniel klein, its hard to give any general advice on this topic.
Regarding your error (altough I am not that familiar with suest): suest seems to check for any changes regarding the the weighting command, in this case the difference causing the error seems to be the different variable. Although I'm not recommending this, as I am not completely sure about the consequences, you can compare the results using suest for example like this:

Code:

gen wgt = finalwgt
svyset psuid [pw=wgt], strata(stratid)
svy: ologit health female black age c.age#c.age
est store Y1

wgtdistrim finalwgt , generate(double pw_t) upper(.01)
replace wgt = pw_t

svy: ologit health female black age c.age#c.age
est store Y2
suest Y1 Y2

Resulting in this output (as to be expected there are some changes in the point estimates und smaller standard errors using the trimmed weights):

Code:

Simultaneous survey results for Y1, Y2

Number of strata = 31                            Number of obs   =      10,335
Number of PSUs   = 62                            Population size = 116,997,084
                                                 Design df       =          31

------------------------------------------------------------------------------
             |             Linearized
             | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
Y1_health    |
      female |  -.1615219   .0510451    -3.16   0.003     -.265629   -.0574148
       black |   -.986568   .0793733   -12.43   0.000    -1.148451    -.824685
         age |  -.0119491   .0083755    -1.43   0.164     -.029031    .0051329
             |
 c.age#c.age |  -.0003234   .0000919    -3.52   0.001    -.0005107    -.000136
-------------+----------------------------------------------------------------
/Y1          |
        cut1 |  -4.566229   .1620661                     -4.896765   -4.235693
        cut2 |  -3.057415   .1700066                     -3.404146   -2.710684
        cut3 |  -1.520596   .1722947                     -1.871994   -1.169199
        cut4 |   -.242785   .1711091                     -.5917643    .1061943
-------------+----------------------------------------------------------------
Y2_health    |
      female |  -.1640712   .0510219    -3.22   0.003    -.2681311   -.0600114
       black |  -.9844041   .0792971   -12.41   0.000    -1.146132   -.8226766
         age |   -.012558   .0083768    -1.50   0.144    -.0296427    .0045267
             |
 c.age#c.age |  -.0003162   .0000918    -3.44   0.002    -.0005035   -.0001289
-------------+----------------------------------------------------------------
/Y2          |
        cut1 |  -4.573808   .1618973                        -4.904   -4.243616
        cut2 |  -3.065884   .1698778                     -3.412352   -2.719416
        cut3 |   -1.53332   .1721877                     -1.884499   -1.182141
        cut4 |  -.2533585   .1709936                     -.6021023    .0953854
------------------------------------------------------------------------------

Announcement

New on SSC: -wgtdistrim-

Comment

Comment

Comment

Comment