Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Weighted average using Collapse command

    Dear Statausers,

    I have panel data for 290 cities belonging to 21 regions. I have city-level data on cost shares and population size. Now, I want to know the regional average, but I want to weigh each city's cost share by its population size. The data looks something like this:
    city cost_share pop_city region year
    1 10 100000 NE 1
    2 15 10000 NE 2
    3 5 5000 SE 1
    4 8 6000 SE 2

    I have read the Collapse Manual (http://www.stata.com/manuals13/dcollapse.pdf) but I am not sure I am understanding it correctly. It says (p. 6) that "Weight normalization affects only the sum, count, sd, semean, and sebinomial statistics.". On p.7 in the manual, in example 4, an example of a weighted mean in a similar setting that I use, is shown, as following:
    . collapse (mean) age income (median) medage=age medinc=income (rawsum) pop > [aweight=pop], by(region) Is it possible to do what I want using following code?

    collapse (mean) cost_share [aweight=pop_city], by(region year)

    It works in the sense that its provides me a region level variable which is different from when specifying

    collapse (mean) cost_share, by(region year)

    But I would just like to check with someone that it is actually performing what I am looking for.

    Best regards,
    Hanna L

  • #2
    In cases like these, I always create a small dummy dataset with easy to calculate values and test whether I get the result I want.
    Code:
    clear
    set obs 10
    gen x = _n
    gen weight = runiformint(0,1)
    sum x [aweight = weight]
    collapse (mean) x [aweight = weight]

    Comment


    • #3
      Dear Jesse,
      Thanks for your advice! I ran both the collapse and the egen code, respectively, on a smaller data set, and I saw straight away how they work.

      Cheers,
      -Hanna



      Comment

      Working...
      X