Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculate difference across rows by the value of a third variable

    Hi experts,

    My data are like below. It consists of one variable price, and another variable group. The group variable indicates which group a certain observation belongs to.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    price group
     4099 11
     4749 11
     3799 12
     4816 16
     7827 20
     5788 21
     4453 10
     5189 16
    10372 17
     4082 13
    11385 20
    14500 16
    15906 13
     3299  9
     5705 20
     4504 17
     5104 16
     3667  7
     3955 13
     3984  8
     4010 17
     5886 17
     6342 21
     4389  9
     4187 10
    11497 22
    13594 18
    13466 15
     3829  9
     5379 16
     6165 23
     4516 15
     6303 16
     3291 17
     8814 20
     5172 16
     4733 16
     4890 20
     4181 14
     4195 10
    10371 17
     4647 11
     4425 11
     4482 17
     6486  8
     4060 16
     5798 20
     4934  7
     5222 16
     4723 17
     4424 13
     4172  7
     9690 15
     6295 11
     9735 12
     6229  6
     4589  8
     5079  8
     8129  8
     4296 16
     5799 10
     4499  5
     3995 11
    12990 14
     3895 10
     3798 11
     5899 14
     3748  9
     5719 11
     7140 12
     5397 15
     4697 15
     6850 16
    11995 14
    end
    Question: I want to calculate the absolute difference in the value of price between each observation i and all the other observations that are not in the same group as i (group membership is shown in the second variable "group"), and then sum all the absolute differences up. Is there a quick way of doing so? My data is much larger than what is shown above, so I prefer a simple yet systematic solution.

    The data shown above simply comes from "sysuse auto,clear". I just rename the variable trunk as group to make it easier for you to understand my problem.

    Thank you!
    Last edited by shem shen; 13 Feb 2019, 18:43.

  • #2
    This won't be what you want, but (a) you were missing the input line from the dataex command, and (b) hopefully something I list here will help you move forward:

    Code:
    dataex group price if group<=12  // Data shared via -dataex-. To install: ssc install dataex
    clear
    input byte group int price
     5 4499
     6 6229
     7 3667
     7 4172
     7 4934
     8 3984
     8 4589
     8 5079
     8 6486
     8 8129
     9 3299
     9 3748
     9 3829
     9 4389
    10 3895
    10 4187
    10 4195
    10 4453
    10 5799
    11 3798
    11 3995
    11 4099
    11 4425
    11 4647
    11 4749
    11 5719
    11 6295
    12 3799
    12 7140
    12 9735
    end
    
    
    .  tabstat price if group<=12, by(group) stat(n mean median min max sd) format(%10.0fc)
    
    Summary for variables: price
         by categories of: group
    
       group |         N      mean       p50       min       max        sd
    ---------+------------------------------------------------------------
           5 |         1     4,499     4,499     4,499     4,499         .
           6 |         1     6,229     6,229     6,229     6,229         .
           7 |         3     4,258     4,172     3,667     4,934       638
           8 |         5     5,653     5,079     3,984     8,129     1,664
           9 |         4     3,816     3,789     3,299     4,389       447
          10 |         5     4,506     4,195     3,895     5,799       749
          11 |         8     4,716     4,536     3,798     6,295       873
          12 |         3     6,891     7,140     3,799     9,735     2,976
    ---------+------------------------------------------------------------
       Total |        30     4,932     4,439     3,299     9,735     1,455
    ----------------------------------------------------------------------
    Code:
    bysort group (price): gen n = _n
    bysort group (n): gen price_diff   = price - price[_n-1]  // abs diff from prior obs (within same group)
    bysort group (n): gen price_diff2 = price - price[1]  // abs diff from 1st obs (within same group)
    egen avg_price = mean(price), by(group)
    gen abs_diff = price - avg_price
    egen sum_abs_diff = total( abs_diff), by(group)  // this will obviously sum to 0, just showing as example
    format avg_price abs_diff sum_abs_diff %10.1fc
    
    . list if inrange(group, 5, 12), sepby(group) abbrev(14)
    
         +------------------------------------------------------------------------------------+
         | group   n   price   price_diff   price_diff2   avg_price   abs_diff   sum_abs_diff |
         |------------------------------------------------------------------------------------|
      1. |     5   1   4,499            .             0     4,499.0        0.0            0.0 |
         |------------------------------------------------------------------------------------|
      2. |     6   1   6,229            .             0     6,229.0        0.0            0.0 |
         |------------------------------------------------------------------------------------|
      3. |     7   1   3,667            .             0     4,257.7     -590.7            0.0 |
      4. |     7   2   4,172          505           505     4,257.7      -85.7            0.0 |
      5. |     7   3   4,934          762         1,267     4,257.7      676.3            0.0 |
         |------------------------------------------------------------------------------------|
      6. |     8   1   3,984            .             0     5,653.4   -1,669.4            0.0 |
      7. |     8   2   4,589          605           605     5,653.4   -1,064.4            0.0 |
      8. |     8   3   5,079          490         1,095     5,653.4     -574.4            0.0 |
      9. |     8   4   6,486        1,407         2,502     5,653.4      832.6            0.0 |
     10. |     8   5   8,129        1,643         4,145     5,653.4    2,475.6            0.0 |
         |------------------------------------------------------------------------------------|
     11. |     9   1   3,299            .             0     3,816.3     -517.3            0.0 |
     12. |     9   2   3,748          449           449     3,816.3      -68.3            0.0 |
     13. |     9   3   3,829           81           530     3,816.3       12.8            0.0 |
     14. |     9   4   4,389          560         1,090     3,816.3      572.8            0.0 |
         |------------------------------------------------------------------------------------|
     15. |    10   1   3,895            .             0     4,505.8     -610.8            0.0 |
     16. |    10   2   4,187          292           292     4,505.8     -318.8            0.0 |
     17. |    10   3   4,195            8           300     4,505.8     -310.8            0.0 |
     18. |    10   4   4,453          258           558     4,505.8      -52.8            0.0 |
     19. |    10   5   5,799        1,346         1,904     4,505.8    1,293.2            0.0 |
         |------------------------------------------------------------------------------------|
     20. |    11   1   3,798            .             0     4,715.9     -917.9            0.0 |
     21. |    11   2   3,995          197           197     4,715.9     -720.9            0.0 |
     22. |    11   3   4,099          104           301     4,715.9     -616.9            0.0 |
     23. |    11   4   4,425          326           627     4,715.9     -290.9            0.0 |
     24. |    11   5   4,647          222           849     4,715.9      -68.9            0.0 |
     25. |    11   6   4,749          102           951     4,715.9       33.1            0.0 |
     26. |    11   7   5,719          970         1,921     4,715.9    1,003.1            0.0 |
     27. |    11   8   6,295          576         2,497     4,715.9    1,579.1            0.0 |
         |------------------------------------------------------------------------------------|
     28. |    12   1   3,799            .             0     6,891.3   -3,092.3           -0.0 |
     29. |    12   2   7,140        3,341         3,341     6,891.3      248.7           -0.0 |
     30. |    12   3   9,735        2,595         5,936     6,891.3    2,843.7           -0.0 |
         +------------------------------------------------------------------------------------+

    Comment


    • #3
      Originally posted by David Benson View Post
      This won't be what you want, but (a) you were missing the input line from the dataex command, and (b) hopefully something I list here will help you move forward:

      Code:
      dataex group price if group<=12 // Data shared via -dataex-. To install: ssc install dataex
      clear
      input byte group int price
      5 4499
      6 6229
      7 3667
      7 4172
      7 4934
      8 3984
      8 4589
      8 5079
      8 6486
      8 8129
      9 3299
      9 3748
      9 3829
      9 4389
      10 3895
      10 4187
      10 4195
      10 4453
      10 5799
      11 3798
      11 3995
      11 4099
      11 4425
      11 4647
      11 4749
      11 5719
      11 6295
      12 3799
      12 7140
      12 9735
      end
      
      
      . tabstat price if group<=12, by(group) stat(n mean median min max sd) format(%10.0fc)
      
      Summary for variables: price
      by categories of: group
      
      group | N mean p50 min max sd
      ---------+------------------------------------------------------------
      5 | 1 4,499 4,499 4,499 4,499 .
      6 | 1 6,229 6,229 6,229 6,229 .
      7 | 3 4,258 4,172 3,667 4,934 638
      8 | 5 5,653 5,079 3,984 8,129 1,664
      9 | 4 3,816 3,789 3,299 4,389 447
      10 | 5 4,506 4,195 3,895 5,799 749
      11 | 8 4,716 4,536 3,798 6,295 873
      12 | 3 6,891 7,140 3,799 9,735 2,976
      ---------+------------------------------------------------------------
      Total | 30 4,932 4,439 3,299 9,735 1,455
      ----------------------------------------------------------------------
      Code:
      bysort group (price): gen n = _n
      bysort group (n): gen price_diff = price - price[_n-1] // abs diff from prior obs (within same group)
      bysort group (n): gen price_diff2 = price - price[1] // abs diff from 1st obs (within same group)
      egen avg_price = mean(price), by(group)
      gen abs_diff = price - avg_price
      egen sum_abs_diff = total( abs_diff), by(group) // this will obviously sum to 0, just showing as example
      format avg_price abs_diff sum_abs_diff %10.1fc
      
      . list if inrange(group, 5, 12), sepby(group) abbrev(14)
      
      +------------------------------------------------------------------------------------+
      | group n price price_diff price_diff2 avg_price abs_diff sum_abs_diff |
      |------------------------------------------------------------------------------------|
      1. | 5 1 4,499 . 0 4,499.0 0.0 0.0 |
      |------------------------------------------------------------------------------------|
      2. | 6 1 6,229 . 0 6,229.0 0.0 0.0 |
      |------------------------------------------------------------------------------------|
      3. | 7 1 3,667 . 0 4,257.7 -590.7 0.0 |
      4. | 7 2 4,172 505 505 4,257.7 -85.7 0.0 |
      5. | 7 3 4,934 762 1,267 4,257.7 676.3 0.0 |
      |------------------------------------------------------------------------------------|
      6. | 8 1 3,984 . 0 5,653.4 -1,669.4 0.0 |
      7. | 8 2 4,589 605 605 5,653.4 -1,064.4 0.0 |
      8. | 8 3 5,079 490 1,095 5,653.4 -574.4 0.0 |
      9. | 8 4 6,486 1,407 2,502 5,653.4 832.6 0.0 |
      10. | 8 5 8,129 1,643 4,145 5,653.4 2,475.6 0.0 |
      |------------------------------------------------------------------------------------|
      11. | 9 1 3,299 . 0 3,816.3 -517.3 0.0 |
      12. | 9 2 3,748 449 449 3,816.3 -68.3 0.0 |
      13. | 9 3 3,829 81 530 3,816.3 12.8 0.0 |
      14. | 9 4 4,389 560 1,090 3,816.3 572.8 0.0 |
      |------------------------------------------------------------------------------------|
      15. | 10 1 3,895 . 0 4,505.8 -610.8 0.0 |
      16. | 10 2 4,187 292 292 4,505.8 -318.8 0.0 |
      17. | 10 3 4,195 8 300 4,505.8 -310.8 0.0 |
      18. | 10 4 4,453 258 558 4,505.8 -52.8 0.0 |
      19. | 10 5 5,799 1,346 1,904 4,505.8 1,293.2 0.0 |
      |------------------------------------------------------------------------------------|
      20. | 11 1 3,798 . 0 4,715.9 -917.9 0.0 |
      21. | 11 2 3,995 197 197 4,715.9 -720.9 0.0 |
      22. | 11 3 4,099 104 301 4,715.9 -616.9 0.0 |
      23. | 11 4 4,425 326 627 4,715.9 -290.9 0.0 |
      24. | 11 5 4,647 222 849 4,715.9 -68.9 0.0 |
      25. | 11 6 4,749 102 951 4,715.9 33.1 0.0 |
      26. | 11 7 5,719 970 1,921 4,715.9 1,003.1 0.0 |
      27. | 11 8 6,295 576 2,497 4,715.9 1,579.1 0.0 |
      |------------------------------------------------------------------------------------|
      28. | 12 1 3,799 . 0 6,891.3 -3,092.3 -0.0 |
      29. | 12 2 7,140 3,341 3,341 6,891.3 248.7 -0.0 |
      30. | 12 3 9,735 2,595 5,936 6,891.3 2,843.7 -0.0 |
      +------------------------------------------------------------------------------------+
      Thank you very much David! I really appreciate your help.

      Comment

      Working...
      X