Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Why do statsby regressions assume that integer weights are frequency weights rather than analytic weights?

    I recently noticed that statsby regressions assume that weights are frequency weights rather than analytic weights, as regress does. Although this is apparent in the output of both commands (i.e. the output states frequency weights assumed/analytic weights assumed), it seems unusual that the two commands would treat unspecified weight types differently. Consider the following minimal working example:

    clear all
    set obs 1000
    gen x = runiform()
    gen y = runiform()
    gen g = 1
    gen wt = ceil(runiform()*10)
    reg y x [w=wt]
    statsby _b _se, by(g) clear: reg y x [w=wt]
    list

    Any thoughts? This seems like a particularly easy thing to fix, and I can't think of any justification for the way things are.

  • #2
    Welcome to Statalist, Michael.

    I note that in help statsby tells us

    All weight types supported by command are allowed except pweights; see weight.
    To me this suggests that weighting is at least partially under the control of the statsby command rather than the command being run. Given that, and given that help weights tells us

    Also, each command has its own idea of the "natural" kind of weight. ... [T]he command will tell you what kind of weight it is assuming and perform the request as if you specified that kind of weight.
    From this I infer that statsby is unable to determine the "natural" weight for the command it is running and instead substitutes its own idea of a natural weight.

    I guess the bottom line is, if you know what weight is appropriate it should be specified explicitly rather than left to Stata to determine, as best it can. In the case of your example, this would be to use [aw=wt] in each command.

    Comment


    • #3
      Hi William -- Thanks for the quick reply! I'm a long-time reader, first-time writer.

      I understand your point that Stata cannot correctly determine whether an unspecified weight is a frequency weight or an analytic weight, since that depends on context that Stata can't infer from the command alone. I also agree that it is bad practice not to specify the type of weight in a regression. However, it still seems very odd that Stata's 'guess' in the context of regress would deviate from Stata's 'guess' in the context of a statsby regress. They should line up, no? I worry that a person who generalizes a single regress with unspecified weights to a statsby regress would find their standard errors changing purely due to the use of statsby.

      Comment


      • #4
        I understand your concern.

        I thought that perhaps the statsby "guess" does not have any context and instead does something simple like "the weights are all whole numbers, looks a lot like frequencies to me!" So I tested that by adding .5 to each of the weights in your code. This failed, it appears statsby told regress to use fweights neverthelss, which regress rejected.
        Code:
        . statsby _b _se, by(g) clear: reg y x [w=wt]
        (running regress on estimation sample)
        may not use noninteger frequency weights
        an error occurred when statsby executed regress
        r(401);
        So my provisional assumption is that statsby's "guess" is fweights, regardless of the context or the values of the weights, and it leaves it up to the executed command to throw an error if fweights are inappropriate.

        One thing for sure, it appears the handling of unspecified weights could be better documented in help statsby. Or perhaps statsby should disallow unspecified weights.

        Although I have to admit, I learn something new almost every day, and today I learned that I'd overlooked the option for unspecified weights when I first learned about weights.

        Comment

        Working...
        X