Converting from pweights to other weights

Mitchell Linegar

Join Date: Jun 2015

Posts: 33
#1

Converting from pweights to other weights

14 Jun 2015, 12:41

Hi All,

I have a dataset that has integer weights for individuals (so they are for pweight). However, I need to run an lpoly regression, which does not allow pweights (and only allows aweights and fweights). How do I convert from pweights to fweights? Which should I use?
Tags: None
Joshua D Merfeld

Join Date: Jun 2015

Posts: 86
#2

14 Jun 2015, 15:58

Hi Mitchell,

Here is a (very short) primer on weights: http://www.cpc.unc.edu/research/tool.../weight_syntax

If you still have questions after reading that, let me know.

Josh
Comment
Mitchell Linegar

Join Date: Jun 2015

Posts: 33
#3

14 Jun 2015, 16:18

Thanks Josh! This actually brings up a few more questions... the data that I'm using is (like the page that you posted) using DHS survey data. Ideally I would use pweights, but I can't (due to the nature of lpoly). Any suggestions for what I might do instead?

The lpoly regressions look like this (ideally I would add the weights):

twoway (lpolyci yvar xvar if region==1) (lpolyci yvar xvar if region==2)
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35708
#4

14 Jun 2015, 16:34

The advice here http://www.statalist.org/forums/foru...ghting-problem still seems pertinent.
Comment
Joshua D Merfeld

Join Date: Jun 2015

Posts: 86
#5

14 Jun 2015, 17:11

If you are trying to get confidence intervals then you cannot use aweights instead of pweights in lpoly. That said, if you are interested in just eye-balling the relationship, and not performing t-tests, then you can use aweights, as I believe you will get the same coefficients, but the standard errors will be way off.

If someone more familiar with weights can confirm / correct this statement, I'd greatly appreciate it.
Comment
Mitchell Linegar

Join Date: Jun 2015

Posts: 33
#6

14 Jun 2015, 17:27

Hi again Nick! My apologies about not replying to your last response - this post was (in part) hoping to answer some of the questions that I had about your last post, starting with the question of weights! I've begun looking into other options rather than just using the defaults for the bandwidth and the kernel, but this felt like the most important place to start.

To both you and Joshua - I was (and still am) unsure about the relationship between pweights and aweights - what do I need to do to the DHS weights that I am initially given to prepare them to be used for aweights? (Really this is the crux of my question - the conversion process.)

Oh! And one more thing that's relevant - as Joshua said, aweights should be fine if I am just eye-balling the relationship and not looking to perform t-tests. This is indeed what I am hoping to do.

Last edited by Mitchell Linegar; 14 Jun 2015, 17:33.
Comment
Joshua D Merfeld

Join Date: Jun 2015

Posts: 86
#7

14 Jun 2015, 17:42

Probability weights are sampling weights. They represent the probability of an individual being chosen. Therefore, they are a measure of the size of the population being "represented" by that one individual. On the other hand, aweights represent a MEAN of many observations. Therefore, both aweights and pweights represent a larger amount of people. However, aweights mean that the observed value is a mean of all of those people being represented, while pweights mean that the observed value is not the mean of all those people, but simply a value that represents all of those people. As such, an aweight contains much more information than does a pweight and if you run regressions with aweights instead of pweights, then the standard errors will be MUCH too small.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35708
#8

14 Jun 2015, 17:46

There are people here who are expert on pweights; I never use them but wait long enough and they should notice the thread.

If your concern is to eyeball the relationship, a good graph will show you the scatter around a fitted smooth(er) curve, except that much of the point of lpoly is that there are lots of them, depending on your other choices.
Comment

Steve Samuels

Join Date: Mar 2014
Posts: 1786

14 Jun 2015, 19:07

A nearly identical question was asked on the same day as the original post: http://www.statalist.org/forums/foru...ghting-problem

For regress with clustered data, pweights, aweights, and iweights, give identical results (see below). If the data are svyset without strata, the standard errors differ slightly. Unfortunately locpoly does not accept vce(cluster), so standard errors will be wrong for any set of weights.

I suggest that you choose a model with fractional polynomial regression (fp), which also fits flexible models; Take the generated predictors, and use them in svy: regress.

Code:

sysuse auto, clear
gen mkr = substr(make,1,2)
svyset mkr [pw = turn]

svy: regress price mpg head

regress price mpg  head [pw = turn], cluster(mkr)
regress price mpg  head [aw=turn],   cluster(mkr)
regress price mpg  head [iw=turn],   cluster(mkr)

yields:

Code:

. svy: regress price mpg head
(running regress on estimation sample)

Survey: Linear regression

Number of strata   =         1                  Number of obs     =         74
Number of PSUs     =        23                  Population size   =      2,934
                                                Design df         =         22
                                                F(   2,     21)   =       4.44
                                                Prob > F          =     0.0247
                                                R-squared         =     0.2278

------------------------------------------------------------------------------
             |             Linearized
       price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         mpg |  -273.3843   90.67878    -3.01   0.006    -461.4406   -85.32801
    headroom |  -370.9921   314.0619    -1.18   0.250    -1022.317    280.3325
       _cons |    13088.1   2834.715     4.62   0.000     7209.264    18966.94
------------------------------------------------------------------------------

.
. regress price mpg  head [pw = turn], cluster(mkr)
(sum of wgt is   2.9340e+03)

Linear regression                               Number of obs     =         74
                                                F(2, 22)          =       4.52
                                                Prob > F          =     0.0226
                                                R-squared         =     0.2278
                                                Root MSE          =     2699.1

                                   (Std. Err. adjusted for 23 clusters in mkr)
------------------------------------------------------------------------------
             |               Robust
       price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         mpg |  -273.3843   91.94708    -2.97   0.007    -464.0709   -82.69772
    headroom |  -370.9921   318.4546    -1.16   0.257    -1031.427    289.4424
       _cons |    13088.1   2874.363     4.55   0.000     7127.039    19049.17
------------------------------------------------------------------------------

. regress price mpg  head [aw=turn],   cluster(mkr)
(sum of wgt is   2.9340e+03)

Linear regression                               Number of obs     =         74
                                                F(2, 22)          =       4.52
                                                Prob > F          =     0.0226
                                                R-squared         =     0.2278
                                                Root MSE          =     2699.1

                                   (Std. Err. adjusted for 23 clusters in mkr)
------------------------------------------------------------------------------
             |               Robust
       price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         mpg |  -273.3843   91.94708    -2.97   0.007    -464.0709   -82.69772
    headroom |  -370.9921   318.4546    -1.16   0.257    -1031.427    289.4424
       _cons |    13088.1   2874.363     4.55   0.000     7127.039    19049.17
------------------------------------------------------------------------------

. regress price mpg  head [iw=turn],   cluster(mkr)
(sum of wgt is   2.9340e+03)

Linear regression                               Number of obs     =         74
                                                F(2, 22)          =       4.52
                                                Prob > F          =     0.0226
                                                R-squared         =     0.2278
                                                Root MSE          =     2699.1

                                   (Std. Err. adjusted for 23 clusters in mkr)
------------------------------------------------------------------------------
             |               Robust
       price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         mpg |  -273.3843   91.94708    -2.97   0.007    -464.0709   -82.69772
    headroom |  -370.9921   318.4546    -1.16   0.257    -1031.427    289.4424
       _cons |    13088.1   2874.363     4.55   0.000     7127.039    19049.17
------------------------------------------------------------------------------

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2

Comment

Steve Samuels

Join Date: Mar 2014

Posts: 1786
#10

15 Jun 2015, 04:53

I mistakenly assumed that fp would take weights; it does not. In that case, I'd do an unweighted analysis that includes the design weights as predictors; or, if the weights are related to known variables, include those variables as predictors. For example, in Demographic and Health Surveys, probabilities of selections are are related to geographic stratum and, within households, to the number of eligible females. See also Skinner and Mason, 2012.

Reference:
Skinner, C., and B. Mason. 2012. Weighting in the regression analysis of survey data with a cross-national application. Canadian Journal of Statistics 40, 697-711.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment
Mitchell Linegar

Join Date: Jun 2015

Posts: 33
#11

05 Jul 2015, 12:59

Thank you very much Steve! This is incredibly helpful!
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#12

06 Jul 2015, 20:25

You are welcome, Mitchell. It turns out I was wrong about fp. It isn't compatible with svy, but can accept pweights if the analysis command accepts them. If the analysis command also accepts clusters, you can mimic the survey analysis. You lose only the stratum option, which might increase standard errors somewhat. However can probably get back the benefit, if any, of stratification, by adding stratum information to the list of predictors.

Code:

sysuse auto, clear gen mkr = substr(make,1,2) fp <headroom>: regress price <headroom> [pw = turn], vce(cluster mkr)

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment

Announcement