Using fweights for repeated individuals

Paul Burkander

Join Date: May 2015

Posts: 13
#1

Using fweights for repeated individuals

13 Jun 2019, 14:37

Hi all,

We're working on an analysis that uses matching with replacement. Each treatment individual appears in the data once, and each comparison individual is potentially matched to multiple treatment individuals with the variable _weight indicating the number of treatment individuals to whom they were matched.

My understanding of stata's weights was that the fweight option is tailor made to account for the fact that the error term is perfectly correlated among repeated matches of the same comparison student. This interpretation seems consistent with what the psmatch2 folks suggest too, accoring to https://www.ssc.wisc.edu/sscc/pubs/stata_psmatch.htm

I therefore suggested we estimate the effect using something like

reg y x [fweight=_weight]

My colleague however caught that we get different results if we also cluster on an indiviudal ID variable, as in

reg y x [fweight=_weight], cluster(ID)

I'm now having a crisis of confidence! Could it be that fweights is merely a convenience function taht accounts for observationally identical individuals, without accounting for the correlated error among them? And thus the additional clustering option is needed when duplicate observations are duplicate people?

Curiously, the fweight with a cluster option leads to the same results as a pweight regression without clusters. Is it therefore valid to use pweights to account for duplicate individuals?

Thanks much,
Paul
Tags: None
Richard Williams

Join Date: Apr 2014

Posts: 5026
#2

13 Jun 2019, 18:24

It is unclear what is the same and what is different in your message.

My colleague however caught that we get different results if we also cluster on an indiviudal ID variable

How do they differ? Do coefficients differ, or just standard errors? I would think the latter.

Curiously, the fweight with a cluster option leads to the same results as a pweight regression without clusters. Is it therefore valid to use pweights to account for duplicate individuals?

Are both the coefficients and standard errors the same?

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://academicweb.nd.edu/~rwilliam/
Comment
Paul Burkander

Join Date: May 2015

Posts: 13
#3

13 Jun 2019, 19:07

Darn sorry for the ambiguity.

So yes, the fweight with and without the clustering leads to different standard errors, while the coefficients are identical.

The pweight option leads to numerically identical coefficients and standard errors compared to the fweight with cluster specification - I found that quite surprising. Different approaches often lead to similar results in my experience but identical results to 7 digits seems remarkable.
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17753

14 Jun 2019, 03:21

Paul:
what you experienced is documented in Stata .pdf manual, [ U ] 20 Estimation and postestimation commands, page 333-334.
The following toy-example replicates your experience with -weight-:

Code:

sysuse auto.dta
. g id=_n

. reg price mpg [fweight=weight]

      Source |       SS           df       MS      Number of obs   =   223,440
-------------+----------------------------------   F(1, 223438)    =  65712.27
       Model |  5.2107e+11         1  5.2107e+11   Prob > F        =    0.0000
    Residual |  1.7718e+12   223,438  7929518.15   R-squared       =    0.2273
-------------+----------------------------------   Adj R-squared   =    0.2273
       Total |  2.2928e+12   223,439  10261513.6   Root MSE        =    2815.9

------------------------------------------------------------------------------
       price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         mpg |  -284.7126   1.110666  -256.34   0.000    -286.8895   -282.5357
       _cons |   12294.64   23.11792   531.82   0.000     12249.33    12339.95
------------------------------------------------------------------------------

. reg price mpg [fweight=weight],cluster(id)

Linear regression                               Number of obs     =    223,440
                                                F(1, 73)          =      18.95
                                                Prob > F          =     0.0000
                                                R-squared         =     0.2273
                                                Root MSE          =     2815.9

                                    (Std. Err. adjusted for 74 clusters in id)
------------------------------------------------------------------------------
             |               Robust
       price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         mpg |  -284.7126   65.39854    -4.35   0.000    -415.0517   -154.3735
       _cons |   12294.64   1502.043     8.19   0.000     9301.075    15288.21
------------------------------------------------------------------------------

. reg price mpg [pweight=weight]
(sum of wgt is 223,440)

Linear regression                               Number of obs     =         74
                                                F(1, 72)          =      18.69
                                                Prob > F          =     0.0000
                                                R-squared         =     0.2273
                                                Root MSE          =     2854.8

------------------------------------------------------------------------------
             |               Robust
       price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         mpg |  -284.7126   65.85098    -4.32   0.000    -415.9841    -153.441
       _cons |   12294.64   1512.435     8.13   0.000     9279.659    15309.63
------------------------------------------------------------------------------

Kind regards,
Carlo
(Stata 19.0)

Comment

Richard Williams

Join Date: Apr 2014

Posts: 5026
#5

14 Jun 2019, 04:17

Returning to the original question, using fweight by itself seems wrong to me because it inflates the sample size. I lean towards pweight but I can't swear that is the right approach either.

Is there some reason for not using the teffects command?

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://academicweb.nd.edu/~rwilliam/
Comment
Paul Burkander

Join Date: May 2015

Posts: 13
#6

14 Jun 2019, 06:50

Originally posted by Carlo Lazzaro View Post

Paul:
what you experienced is documented in Stata .pdf manual, [ U ] 20 Estimation and postestimation commands, page 333-334.
The following toy-example replicates your experience with -weight-:

Thanks Carlo; I presume you mean the technical note on section 20.24.1 that says the fweights command "should be the same as running the command on the unweighted, expanded data." That makes sense to me now.

Is there some reason for not using the teffects command?

My understanding of and experience with the teffects command is that it combines the various matching/weighting approaches with treatment effect estimation. Following advice I've repeatedly seen in the literature, we prefer to do the matching multiple ways to assess balance, estiamting treatment effects usingly only the methods that best balances the treatment and comparison groups. Were we to observe the treatment effect as part of this decision it could bias our choice.

Is there a way to use teffects and assess balance before calculating the treatment effect?

I do wonder if teffects does a better job handling repeated matches.

Thanks much.
Comment

Announcement

Using fweights for repeated individuals

Comment

Comment

Comment

Comment

Comment