Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • collapse and weights

    Hi,

    I have price and quantity data and like to compute a quantity weighted price average using collapse.
    My quantity data can be negative. I was wondering what the correct weight to use in collapse is. Currently, I am using "iweight". Is this correct?
    Is this going to compute a weighted average?
    Thx,
    Moritz

  • #2
    Well, it depends on your use of -collapse-. That is one reason you are asked to post a minimal example with code. You are also asked to use your real full name when registering with Statalist. Please follow advice given at the footer of this post.

    The computation done by -collapse- is documented in -help collapse-:

    fweight, iweight, pweight: sum(w_j*x_j); w_j = user supplied weights.
    Suppose all your quantities are positive:

    If your intention is to compute what I interpret to be the "usual" weighted average, then using -collapse- alone will not give what you want. This you can clearly see from the previous quote. You can use -collapse- in the following way to get a weighted average (by year):
    Code:
    clear
    set more off
    
    webuse college
    drop gpa
    
    list, sepby(year)
    
    bysort year: egen totnum = total(number)
    gen myweight = number / totnum
    
    collapse (sum) hour [iw=myweight], by(year)
    
    list
    Another, more "manual" approach, would be:
    Code:
    clear
    set more off
    
    webuse college
    drop gpa
    
    list, sepby(year)
    
    gen hXn = hour * number
    bysort year: egen tothXn = total(hXn)
    by year: egen totn = total(number)
    
    gen wavg = tothXn / totn
    
    list, sepby(year)
    There are other ways, of course.

    Note that -iweight- will accept negative numbers (unlike the other weight types) and the example code given will compute fine with negative numbers, but the "usual" definition of weighted average is for non-negative weights. I'm not endorsing in anyway that your computation is substantively correct. I'm only trying to clarify the coding issue.
    You should:

    1. Read the FAQ carefully.

    2. "Say exactly what you typed and exactly what Stata typed (or did) in response. N.B. exactly!"

    3. Describe your dataset. Use list to list data when you are doing so. Use input to type in your own dataset fragment that others can experiment with.

    4. Use the advanced editing options to appropriately format quotes, data, code and Stata output. The advanced options can be toggled on/off using the A button in the top right corner of the text editor.

    Comment


    • #3
      Roberto,

      collapse (mean) normalizes weights to sum to 1, as the following comparison, to mean with pweight, shows.

      Code:
      . sysuse auto, clear
      (1978 Automobile Data)
      
      . collapse (mean) price [iw = turn], by(foreign)
      
      . format price %10.3f
      
      . list
      
           +---------------------+
           |  foreign      price |
           |---------------------|
        1. | Domestic   6198.785 |
        2. |  Foreign   6448.990 |
           +---------------------+
      
      . sysuse auto, clear
      (1978 Automobile Data)
      
      . mean price [pw = turn], over(foreign)
      
      Mean estimation                     Number of obs    =      74
      
           Domestic: foreign = Domestic
            Foreign: foreign = Foreign
      
      --------------------------------------------------------------
              Over |       Mean   Std. Err.     [95% Conf. Interval]
      -------------+------------------------------------------------
      price        |
          Domestic |   6198.785     450.76      5300.422    7097.149
           Foreign |    6448.99   564.4797      5323.983    7573.996
      --------------------------------------------------------------
      Last edited by Steve Samuels; 05 Oct 2014, 13:23.
      Steve Samuels
      Statistical Consulting
      [email protected]

      Stata 14.2

      Comment


      • #4
        Steve,

        Thank you for clarifying.

        So, -collapse- will rescale for -mean-, but not for -sum-:

        Code:
        clear
        set more off
        
        *-----example data -----
        
        webuse college
        keep if year == 4
        keep hour number
        
        list
        
        *-----
        
        collapse (mean) mhour=hour (sum) shour=hour [iw=number]
        list
        
        display 32*(5/7) + 31*(2/7)
        I think it makes sense, but was unexpected.

        -help weights- states:

        iweights

        This weight has no formal statistical definition and is a catch-all category. The weight somehow reflects the importance of the observation and any command that supports such weights will define exactly how such weights are treated.
        [emphasis is my own]

        But -help collapse- doesn't mention anything about -mean-.
        You should:

        1. Read the FAQ carefully.

        2. "Say exactly what you typed and exactly what Stata typed (or did) in response. N.B. exactly!"

        3. Describe your dataset. Use list to list data when you are doing so. Use input to type in your own dataset fragment that others can experiment with.

        4. Use the advanced editing options to appropriately format quotes, data, code and Stata output. The advanced options can be toggled on/off using the A button in the top right corner of the text editor.

        Comment

        Working...
        X