Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using fweights for repeated individuals

    Hi all,

    We're working on an analysis that uses matching with replacement. Each treatment individual appears in the data once, and each comparison individual is potentially matched to multiple treatment individuals with the variable _weight indicating the number of treatment individuals to whom they were matched.

    My understanding of stata's weights was that the fweight option is tailor made to account for the fact that the error term is perfectly correlated among repeated matches of the same comparison student. This interpretation seems consistent with what the psmatch2 folks suggest too, accoring to https://www.ssc.wisc.edu/sscc/pubs/stata_psmatch.htm

    I therefore suggested we estimate the effect using something like

    reg y x [fweight=_weight]

    My colleague however caught that we get different results if we also cluster on an indiviudal ID variable, as in

    reg y x [fweight=_weight], cluster(ID)

    I'm now having a crisis of confidence! Could it be that fweights is merely a convenience function taht accounts for observationally identical individuals, without accounting for the correlated error among them? And thus the additional clustering option is needed when duplicate observations are duplicate people?

    Curiously, the fweight with a cluster option leads to the same results as a pweight regression without clusters. Is it therefore valid to use pweights to account for duplicate individuals?

    Thanks much,
    Paul

  • #2
    It is unclear what is the same and what is different in your message.

    My colleague however caught that we get different results if we also cluster on an indiviudal ID variable
    How do they differ? Do coefficients differ, or just standard errors? I would think the latter.

    Curiously, the fweight with a cluster option leads to the same results as a pweight regression without clusters. Is it therefore valid to use pweights to account for duplicate individuals?
    Are both the coefficients and standard errors the same?
    -------------------------------------------
    Richard Williams, Notre Dame Dept of Sociology
    StataNow Version: 19.5 MP (2 processor)

    EMAIL: [email protected]
    WWW: https://www3.nd.edu/~rwilliam

    Comment


    • #3
      Darn sorry for the ambiguity.

      So yes, the fweight with and without the clustering leads to different standard errors, while the coefficients are identical.

      The pweight option leads to numerically identical coefficients and standard errors compared to the fweight with cluster specification - I found that quite surprising. Different approaches often lead to similar results in my experience but identical results to 7 digits seems remarkable.

      Comment


      • #4
        Paul:
        what you experienced is documented in Stata .pdf manual, [ U ] 20 Estimation and postestimation commands, page 333-334.
        The following toy-example replicates your experience with -weight-:

        Code:
        sysuse auto.dta
        . g id=_n
        
        . reg price mpg [fweight=weight]
        
              Source |       SS           df       MS      Number of obs   =   223,440
        -------------+----------------------------------   F(1, 223438)    =  65712.27
               Model |  5.2107e+11         1  5.2107e+11   Prob > F        =    0.0000
            Residual |  1.7718e+12   223,438  7929518.15   R-squared       =    0.2273
        -------------+----------------------------------   Adj R-squared   =    0.2273
               Total |  2.2928e+12   223,439  10261513.6   Root MSE        =    2815.9
        
        ------------------------------------------------------------------------------
               price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
                 mpg |  -284.7126   1.110666  -256.34   0.000    -286.8895   -282.5357
               _cons |   12294.64   23.11792   531.82   0.000     12249.33    12339.95
        ------------------------------------------------------------------------------
        
        . reg price mpg [fweight=weight],cluster(id)
        
        Linear regression                               Number of obs     =    223,440
                                                        F(1, 73)          =      18.95
                                                        Prob > F          =     0.0000
                                                        R-squared         =     0.2273
                                                        Root MSE          =     2815.9
        
                                            (Std. Err. adjusted for 74 clusters in id)
        ------------------------------------------------------------------------------
                     |               Robust
               price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
                 mpg |  -284.7126   65.39854    -4.35   0.000    -415.0517   -154.3735
               _cons |   12294.64   1502.043     8.19   0.000     9301.075    15288.21
        ------------------------------------------------------------------------------
        
        . reg price mpg [pweight=weight]
        (sum of wgt is 223,440)
        
        Linear regression                               Number of obs     =         74
                                                        F(1, 72)          =      18.69
                                                        Prob > F          =     0.0000
                                                        R-squared         =     0.2273
                                                        Root MSE          =     2854.8
        
        ------------------------------------------------------------------------------
                     |               Robust
               price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
                 mpg |  -284.7126   65.85098    -4.32   0.000    -415.9841    -153.441
               _cons |   12294.64   1512.435     8.13   0.000     9279.659    15309.63
        ------------------------------------------------------------------------------
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Returning to the original question, using fweight by itself seems wrong to me because it inflates the sample size. I lean towards pweight but I can't swear that is the right approach either.

          Is there some reason for not using the teffects command?
          -------------------------------------------
          Richard Williams, Notre Dame Dept of Sociology
          StataNow Version: 19.5 MP (2 processor)

          EMAIL: [email protected]
          WWW: https://www3.nd.edu/~rwilliam

          Comment


          • #6
            Originally posted by Carlo Lazzaro View Post
            Paul:
            what you experienced is documented in Stata .pdf manual, [ U ] 20 Estimation and postestimation commands, page 333-334.
            The following toy-example replicates your experience with -weight-:
            Thanks Carlo; I presume you mean the technical note on section 20.24.1 that says the fweights command "should be the same as running the command on the unweighted, expanded data." That makes sense to me now.
            Is there some reason for not using the teffects command?
            My understanding of and experience with the teffects command is that it combines the various matching/weighting approaches with treatment effect estimation. Following advice I've repeatedly seen in the literature, we prefer to do the matching multiple ways to assess balance, estiamting treatment effects usingly only the methods that best balances the treatment and comparison groups. Were we to observe the treatment effect as part of this decision it could bias our choice.

            Is there a way to use teffects and assess balance before calculating the treatment effect?

            I do wonder if teffects does a better job handling repeated matches.

            Thanks much.

            Comment

            Working...
            X