Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Average within a group for each unique combination

    I would like to take a simple average within each group but only over each unique iso and urb_cl combination.

    For example, in the table below (a sample of my dataset is provided at the end) the average I want for GEO=YRP and URB_CL=LOW is 6.3 (=(11.305+1.391)/2), rather than 8.0 (=(11.305+11.305+1.391)/3).

    I would like to have the correct average as a new variable, knowing that in this case 6.3 will be the value for all instances when GEO=YRP and URB_CL=LOW.
    geo iso urb_cl urb curr
    YRP ASA low 11.305 4502.119
    YRP ASA 11.305
    YRP ASA low 11.305 7127.425
    YRP KWW 0
    YRP REQ low 1.391 8219.178
    YRP TRS high 66.793 2925.037
    I can get the averages I don't want using:

    Code:
    table geo urb_cl if !missing(curr), c(n urb mean urb ) format(%14.1fc) cellwidth(10)
    ...and I can get the averages I do want by running this same command after collapsing the dataset, i.e.

    Code:
    preserve
    collapse urb urb_cl if !missing( curr ), by( iso geo )
    sort geo iso urb
    table geo urb_cl, c(n urb mean urb ) format(%14.1fc) cellwidth(10)
    restore
    Is there a more straightforward and elegant way to achieve what I want?

    ------------------
    My dataset:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str3(geo iso) float urb_cl double urb float curr
    "PAC" "ASA" 2 82.892 1873.3757
    "PAC" "ASA" 2 82.892 1252.2163
    "PAC" "BGD" 2 68.775 3583.354
    "PAC" "FJI" . 47.317 .
    "PAC" "KHM" 2 79.337 7045.824
    "PAC" "VNM" 2 68.92 1321.158
    "PAC" "VNM" 2 68.92 4199.222
    "YRP" "ASA" 0 11.305 4502.1187
    "YRP" "ASA" . 11.305 .
    "YRP" "ASA" 0 11.305 7127.425
    "YRP" "KWW" . 0 .
    "YRP" "REQ" 0 1.391 8219.178
    "YRP" "TRS" 2 66.793 2925.0366
    end
    label values urb_cl label_n
    label def label_n 0 "low", modify
    label def label_n 2 "high", modify
    Last edited by Ernestina delPiero; 06 Oct 2019, 18:57.

  • #2
    I am completely confused by your post.

    You say you want means of something (what, exactly? from what comes later I guess curr) for each combination of iso and urb_cl. But then you do a table disaggregating by geo and urb_cl. So it's not surprising that doesn't get you what you want. If you do -table iso urb_cl, c(mean urb)- do you get what you want?

    You say you do get what you want by running some code that includes -collapse urb urb_cl...- That command makes no sense to me. urb_cl is a 1/2 categorical variable, and taking its mean will gives you other values between 1 and 2 for urb_cl. So I really don't understand what you are trying to get at here.

    When I look at
    I would like to take a simple average within each group but only over each unique iso and urb_cl combination.

    For example, in the table below (a sample of my dataset is provided at the end) the average I want for GEO=YRP and URB_CL=LOW is 6.3 (=(11.305+1.391)/2), rather than 8.0 (=(11.305+11.305+1.391)/3).
    I think I understand what you want to do here. But there is a big problem with this. This calculation is workable only because the two observations with YRP and low have the same value of curr, 11.305. So it is easy enough to say you only want to count it once. But what if you had two observations with YRP and low and they had different values of curr. What would you do then?

    Comment


    • #3
      I did a terrible job at explaining what I want. So you're right to be confused, Clyde.I would edit my post but no longer can.

      I am actually after the mean value of 'urb' over each combination - rather than each instance - of geo (not iso) and urb_cl.

      I run the collapse command simply because doing it allows me to have a dataset on which I run the - table geo urb_cl, c(n urb mean urb ) - command that gets me the results I am after. I don't take the mean of urb_cl.

      Hopefully this explains better what I am after.

      Comment


      • #4
        OK, this is better. I think what you want is this:
        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input str3(geo iso) float urb_cl double urb float curr
        "PAC" "ASA" 2 82.892 1873.3757
        "PAC" "ASA" 2 82.892 1252.2163
        "PAC" "BGD" 2 68.775 3583.354
        "PAC" "FJI" . 47.317 .
        "PAC" "KHM" 2 79.337 7045.824
        "PAC" "VNM" 2 68.92 1321.158
        "PAC" "VNM" 2 68.92 4199.222
        "YRP" "ASA" 0 11.305 4502.1187
        "YRP" "ASA" . 11.305 .
        "YRP" "ASA" 0 11.305 7127.425
        "YRP" "KWW" . 0 .
        "YRP" "REQ" 0 1.391 8219.178
        "YRP" "TRS" 2 66.793 2925.0366
        end
        label values urb_cl label_n
        label def label_n 0 "low", modify
        label def label_n 2 "high", modify
        
        egen flag = tag(geo urb_cl urb) // SELECT ONE OCCURRENCE OF EACH VALUE OF URB
        by geo urb_cl, sort: egen wanted_new_variable = mean(cond(flag, urb, .))

        Comment


        • #5
          That is exactly what I wanted. Thank you.

          Comment

          Working...
          X