Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Regression discontinuity with aweight vs pweight?

    Hi everyone,

    I'd like to write a simple, "sharp" RD command with a triangular kernel - but I'm wondering whether I should use aw or pw weights.

    I've seen both. This previous thread uses pw, whereas the "rd" ado seems to use aw. I've tried to read up on it but I'm confused.

    I create my weights as follows:
    Code:
    gen weight=max(0,`band'-abs(forcing))
    I'm not sure whether I should run

    Code:
    reg dv above forcing forcing2 aboveXforcing aboveXforcing2  [aw=weight]
    or
    Code:
    reg dv above forcing forcing2 aboveXforcing aboveXforcing2  [pw=weight]
    where
    "band" = the bandwidth
    "forcing" = the forcing variable, centered at the cutoff
    "above" = indicates being above the cutoff, i.e. assignment to treatment

    Could you help?

    Thank you in advance,

    Andy

  • #2
    The difference between aweights and pweights is not about what command you are using. It's about the relationship of the dataset to the sample and population.

    If you have used a data-design wherein different people (firms, whatever entities you are analyzing) are accrued to the sample with different probabilities, then that is dealt with using -pweights-. It is a matter of the sampling being non simple-random sampling.

    -aweights- are different. Aweights are intended for a situation where the individual observations in the data set represent composites of different number of units and the numbers in the data represent averages over those units. For example, each person may have been measured a certain number of times (different numbers for different people) and the data set contains one observation with the average. Or each firm has a certain number of employees and each observation is at the firm level and reports the average compensation of all its emloyees. Situations like that are what -aweights- are for.

    Comment


    • #3
      Thank you very much. It looks like pweights are the way to go here. Not that observations closer to the cutoff had a different probability of getting sampled, but they're considered "more important" (and they're not representing more than one observation, either). In fact, it looks like -rd- actually uses pweights - sorry about the confusion.

      Comment

      Working...
      X