Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Converting from pweights to other weights

    Hi All,

    I have a dataset that has integer weights for individuals (so they are for pweight). However, I need to run an lpoly regression, which does not allow pweights (and only allows aweights and fweights). How do I convert from pweights to fweights? Which should I use?

  • #2
    Hi Mitchell,

    Here is a (very short) primer on weights: http://www.cpc.unc.edu/research/tool.../weight_syntax

    If you still have questions after reading that, let me know.

    Josh

    Comment


    • #3
      Thanks Josh! This actually brings up a few more questions... the data that I'm using is (like the page that you posted) using DHS survey data. Ideally I would use pweights, but I can't (due to the nature of lpoly). Any suggestions for what I might do instead?

      The lpoly regressions look like this (ideally I would add the weights):

      twoway (lpolyci yvar xvar if region==1) (lpolyci yvar xvar if region==2)

      Comment


      • #4
        The advice here http://www.statalist.org/forums/foru...ghting-problem still seems pertinent.

        Comment


        • #5
          If you are trying to get confidence intervals then you cannot use aweights instead of pweights in lpoly. That said, if you are interested in just eye-balling the relationship, and not performing t-tests, then you can use aweights, as I believe you will get the same coefficients, but the standard errors will be way off.

          If someone more familiar with weights can confirm / correct this statement, I'd greatly appreciate it.

          Comment


          • #6
            Hi again Nick! My apologies about not replying to your last response - this post was (in part) hoping to answer some of the questions that I had about your last post, starting with the question of weights! I've begun looking into other options rather than just using the defaults for the bandwidth and the kernel, but this felt like the most important place to start.

            To both you and Joshua - I was (and still am) unsure about the relationship between pweights and aweights - what do I need to do to the DHS weights that I am initially given to prepare them to be used for aweights? (Really this is the crux of my question - the conversion process.)

            Oh! And one more thing that's relevant - as Joshua said, aweights should be fine if I am just eye-balling the relationship and not looking to perform t-tests. This is indeed what I am hoping to do.
            Last edited by Mitchell Linegar; 14 Jun 2015, 17:33.

            Comment


            • #7
              Probability weights are sampling weights. They represent the probability of an individual being chosen. Therefore, they are a measure of the size of the population being "represented" by that one individual. On the other hand, aweights represent a MEAN of many observations. Therefore, both aweights and pweights represent a larger amount of people. However, aweights mean that the observed value is a mean of all of those people being represented, while pweights mean that the observed value is not the mean of all those people, but simply a value that represents all of those people. As such, an aweight contains much more information than does a pweight and if you run regressions with aweights instead of pweights, then the standard errors will be MUCH too small.

              Comment


              • #8
                There are people here who are expert on pweights; I never use them but wait long enough and they should notice the thread.

                If your concern is to eyeball the relationship, a good graph will show you the scatter around a fitted smooth(er) curve, except that much of the point of lpoly is that there are lots of them, depending on your other choices.

                Comment


                • #9
                  A nearly identical question was asked on the same day as the original post: http://www.statalist.org/forums/foru...ghting-problem

                  For regress with clustered data, pweights, aweights, and iweights, give identical results (see below). If the data are svyset without strata, the standard errors differ slightly. Unfortunately locpoly does not accept vce(cluster), so standard errors will be wrong for any set of weights.

                  I suggest that you choose a model with fractional polynomial regression (fp), which also fits flexible models; Take the generated predictors, and use them in svy: regress.



                  Code:
                  sysuse auto, clear
                  gen mkr = substr(make,1,2)
                  svyset mkr [pw = turn]
                  
                  svy: regress price mpg head
                  
                  regress price mpg  head [pw = turn], cluster(mkr)
                  regress price mpg  head [aw=turn],   cluster(mkr)
                  regress price mpg  head [iw=turn],   cluster(mkr)
                  yields:

                  Code:
                  . svy: regress price mpg head
                  (running regress on estimation sample)
                  
                  Survey: Linear regression
                  
                  Number of strata   =         1                  Number of obs     =         74
                  Number of PSUs     =        23                  Population size   =      2,934
                                                                  Design df         =         22
                                                                  F(   2,     21)   =       4.44
                                                                  Prob > F          =     0.0247
                                                                  R-squared         =     0.2278
                  
                  ------------------------------------------------------------------------------
                               |             Linearized
                         price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                  -------------+----------------------------------------------------------------
                           mpg |  -273.3843   90.67878    -3.01   0.006    -461.4406   -85.32801
                      headroom |  -370.9921   314.0619    -1.18   0.250    -1022.317    280.3325
                         _cons |    13088.1   2834.715     4.62   0.000     7209.264    18966.94
                  ------------------------------------------------------------------------------
                  
                  .
                  . regress price mpg  head [pw = turn], cluster(mkr)
                  (sum of wgt is   2.9340e+03)
                  
                  Linear regression                               Number of obs     =         74
                                                                  F(2, 22)          =       4.52
                                                                  Prob > F          =     0.0226
                                                                  R-squared         =     0.2278
                                                                  Root MSE          =     2699.1
                  
                                                     (Std. Err. adjusted for 23 clusters in mkr)
                  ------------------------------------------------------------------------------
                               |               Robust
                         price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                  -------------+----------------------------------------------------------------
                           mpg |  -273.3843   91.94708    -2.97   0.007    -464.0709   -82.69772
                      headroom |  -370.9921   318.4546    -1.16   0.257    -1031.427    289.4424
                         _cons |    13088.1   2874.363     4.55   0.000     7127.039    19049.17
                  ------------------------------------------------------------------------------
                  
                  . regress price mpg  head [aw=turn],   cluster(mkr)
                  (sum of wgt is   2.9340e+03)
                  
                  Linear regression                               Number of obs     =         74
                                                                  F(2, 22)          =       4.52
                                                                  Prob > F          =     0.0226
                                                                  R-squared         =     0.2278
                                                                  Root MSE          =     2699.1
                  
                                                     (Std. Err. adjusted for 23 clusters in mkr)
                  ------------------------------------------------------------------------------
                               |               Robust
                         price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                  -------------+----------------------------------------------------------------
                           mpg |  -273.3843   91.94708    -2.97   0.007    -464.0709   -82.69772
                      headroom |  -370.9921   318.4546    -1.16   0.257    -1031.427    289.4424
                         _cons |    13088.1   2874.363     4.55   0.000     7127.039    19049.17
                  ------------------------------------------------------------------------------
                  
                  . regress price mpg  head [iw=turn],   cluster(mkr)
                  (sum of wgt is   2.9340e+03)
                  
                  Linear regression                               Number of obs     =         74
                                                                  F(2, 22)          =       4.52
                                                                  Prob > F          =     0.0226
                                                                  R-squared         =     0.2278
                                                                  Root MSE          =     2699.1
                  
                                                     (Std. Err. adjusted for 23 clusters in mkr)
                  ------------------------------------------------------------------------------
                               |               Robust
                         price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                  -------------+----------------------------------------------------------------
                           mpg |  -273.3843   91.94708    -2.97   0.007    -464.0709   -82.69772
                      headroom |  -370.9921   318.4546    -1.16   0.257    -1031.427    289.4424
                         _cons |    13088.1   2874.363     4.55   0.000     7127.039    19049.17
                  ------------------------------------------------------------------------------
                  Steve Samuels
                  Statistical Consulting
                  [email protected]

                  Stata 14.2

                  Comment


                  • #10
                    I mistakenly assumed that fp would take weights; it does not. In that case, I'd do an unweighted analysis that includes the design weights as predictors; or, if the weights are related to known variables, include those variables as predictors. For example, in Demographic and Health Surveys, probabilities of selections are are related to geographic stratum and, within households, to the number of eligible females. See also Skinner and Mason, 2012.

                    Reference:
                    Skinner, C., and B. Mason. 2012. Weighting in the regression analysis of survey data with a cross-national application. Canadian Journal of Statistics 40, 697-711.

                    Steve Samuels
                    Statistical Consulting
                    [email protected]

                    Stata 14.2

                    Comment


                    • #11
                      Thank you very much Steve! This is incredibly helpful!

                      Comment


                      • #12
                        You are welcome, Mitchell. It turns out I was wrong about fp. It isn't compatible with svy, but can accept pweights if the analysis command accepts them. If the analysis command also accepts clusters, you can mimic the survey analysis. You lose only the stratum option, which might increase standard errors somewhat. However can probably get back the benefit, if any, of stratification, by adding stratum information to the list of predictors.
                        Code:
                        sysuse auto, clear
                        gen mkr = substr(make,1,2)
                        fp <headroom>: regress price <headroom> [pw = turn], vce(cluster mkr)
                        Steve Samuels
                        Statistical Consulting
                        [email protected]

                        Stata 14.2

                        Comment

                        Working...
                        X