Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Subtracting two variables with missing values

    Hello all,

    I have split the data of user reviews in two halfs (chronological order). For each user, I am trying to subtract the average rating of the first half of reviews from the average of the second half. In simpler terms, this screenshot encapsulates what I am trying to accomlish.

    This the code I have tried using but I have not gotten the most desirable results:

    PHP Code:
    by useridegen first_half_rating mean(rating) if half == 1
    by userid
    egen second_half_rating mean(rating) if half == 2
    by userid
    gen difference_of_halfs cond(missing(first_half_rating), 0first_half_rating) - cond(missing(second_half_rating), 0second_half_rating
    by useridegen diff mean(difference_of_halfs
    Additionally, here is a data example. I would appreciate all insights.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str17 userid double rating float(rank first_half_rating second_half_rating difference_of_halfs diff)
    "001544VKwW"   5   1      5         .          5          5
    "007pm3BK9b"   2   1   3.75         .       3.75 -.57894737
    "007pm3BK9b" 3.5   2   3.75         .       3.75 -.57894737
    "007pm3BK9b"   4   3   3.75         .       3.75 -.57894737
    "007pm3BK9b"   4   4   3.75         .       3.75 -.57894737
    "007pm3BK9b"   5   5   3.75         .       3.75 -.57894737
    "007pm3BK9b"   4   6   3.75         .       3.75 -.57894737
    "007pm3BK9b"   4   7   3.75         .       3.75 -.57894737
    "007pm3BK9b" 3.5   8   3.75         .       3.75 -.57894737
    "007pm3BK9b"   3   9      .  3.727273  -3.727273 -.57894737
    "007pm3BK9b" 3.5  10      .  3.727273  -3.727273 -.57894737
    "007pm3BK9b"   4  11      .  3.727273  -3.727273 -.57894737
    "007pm3BK9b"   5  12      .  3.727273  -3.727273 -.57894737
    "007pm3BK9b"   5  13      .  3.727273  -3.727273 -.57894737
    "007pm3BK9b" 3.5  14      .  3.727273  -3.727273 -.57894737
    "007pm3BK9b"   3  15      .  3.727273  -3.727273 -.57894737
    "007pm3BK9b"   3  16      .  3.727273  -3.727273 -.57894737
    "007pm3BK9b"   4  17      .  3.727273  -3.727273 -.57894737
    "007pm3BK9b"   5  18      .  3.727273  -3.727273 -.57894737
    "007pm3BK9b"   2  19      .  3.727273  -3.727273 -.57894737
    "009FmWLEeE"   5   1      5         .          5          5
    "00BLGIWzcJ"   3   1 3.8125         .     3.8125  -.6842105
    "00BLGIWzcJ"   4   2 3.8125         .     3.8125  -.6842105
    "00BLGIWzcJ" 4.5   3 3.8125         .     3.8125  -.6842105
    "00BLGIWzcJ"   5   4 3.8125         .     3.8125  -.6842105
    "00BLGIWzcJ"   4   5 3.8125         .     3.8125  -.6842105
    "00BLGIWzcJ" 1.5   6 3.8125         .     3.8125  -.6842105
    "00BLGIWzcJ"   4   7 3.8125         .     3.8125  -.6842105
    "00BLGIWzcJ" 4.5   8 3.8125         .     3.8125  -.6842105
    "00BLGIWzcJ" 2.5   9      . 3.9545455 -3.9545455  -.6842105
    "00BLGIWzcJ"   4  10      . 3.9545455 -3.9545455  -.6842105
    "00BLGIWzcJ" 4.5  11      . 3.9545455 -3.9545455  -.6842105
    "00BLGIWzcJ"   5  12      . 3.9545455 -3.9545455  -.6842105
    "00BLGIWzcJ" 3.5  13      . 3.9545455 -3.9545455  -.6842105
    "00BLGIWzcJ"   5  14      . 3.9545455 -3.9545455  -.6842105
    "00BLGIWzcJ"   4  15      . 3.9545455 -3.9545455  -.6842105
    "00BLGIWzcJ"   4  16      . 3.9545455 -3.9545455  -.6842105
    "00BLGIWzcJ"   4  17      . 3.9545455 -3.9545455  -.6842105
    "00BLGIWzcJ" 3.5  18      . 3.9545455 -3.9545455  -.6842105
    "00BLGIWzcJ" 3.5  19      . 3.9545455 -3.9545455  -.6842105
    "00D0EXepFD"   4   1    4.5         .        4.5 -1.1428572
    "00D0EXepFD"   5   2    4.5         .        4.5 -1.1428572
    "00D0EXepFD"   3   3      .       3.4       -3.4 -1.1428572
    "00D0EXepFD"   4   4      .       3.4       -3.4 -1.1428572
    "00D0EXepFD"   2   5      .       3.4       -3.4 -1.1428572
    "00D0EXepFD"   4   6      .       3.4       -3.4 -1.1428572
    "00D0EXepFD"   4   7      .       3.4       -3.4 -1.1428572
    "00FqmmCeLF"   5   1      .         5         -5         -5
    "00FqmmCeLF"   5   2      .         5         -5         -5
    "00GM0zPWrU" 4.5   1   4.25         .       4.25 -1.4166666
    "00GM0zPWrU"   4   2   4.25         .       4.25 -1.4166666
    "00GM0zPWrU"   4   3      .      4.25      -4.25 -1.4166666
    "00GM0zPWrU"   5   4      .      4.25      -4.25 -1.4166666
    "00GM0zPWrU"   4   5      .      4.25      -4.25 -1.4166666
    "00GM0zPWrU"   4   6      .      4.25      -4.25 -1.4166666
    "00GRD0DXsA" 3.5   1    3.5         .        3.5     -2.125
    "00GRD0DXsA"   4   2      .         4         -4     -2.125
    "00GRD0DXsA"   4   3      .         4         -4     -2.125
    "00GRD0DXsA"   4   4      .         4         -4     -2.125
    "00I4u5rBjQ" 2.5 1.5      .      3.25      -3.25      -3.25
    "00I4u5rBjQ"   4 1.5      .      3.25      -3.25      -3.25
    "00IonREp6n"   5   1      5         .          5          5
    "00LskfiGlk"   4   1      4         .          4          4
    "00M2bF7aW1"   5   1      .         5         -5         -5
    "00M2bF7aW1"   5   2      .         5         -5         -5
    "00M2bF7aW1"   5   3      .         5         -5         -5
    "00M9ISXXrC"  .5   1    2.5         .        2.5 -1.7272726
    "00M9ISXXrC" 3.5   2    2.5         .        2.5 -1.7272726
    "00M9ISXXrC" 3.5   3    2.5         .        2.5 -1.7272726
    "00M9ISXXrC" 2.5   4    2.5         .        2.5 -1.7272726
    "00M9ISXXrC" 3.5   5      .  4.142857  -4.142857 -1.7272726
    "00M9ISXXrC"   3   6      .  4.142857  -4.142857 -1.7272726
    "00M9ISXXrC" 3.5   7      .  4.142857  -4.142857 -1.7272726
    "00M9ISXXrC"   5   8      .  4.142857  -4.142857 -1.7272726
    "00M9ISXXrC"   5   9      .  4.142857  -4.142857 -1.7272726
    "00M9ISXXrC"   5  10      .  4.142857  -4.142857 -1.7272726
    "00M9ISXXrC"   4  11      .  4.142857  -4.142857 -1.7272726
    "00OFXMYGwM"   5   1      .      4.75      -4.75      -4.75
    "00OFXMYGwM" 4.5   2      .      4.75      -4.75      -4.75
    "00OqHKXhlZ"   3   1      3         .          3          3
    "00P0wOQrpc" 3.5   1    3.5         .        3.5       -2.6
    "00P0wOQrpc"   5   2      .     4.125     -4.125       -2.6
    "00P0wOQrpc"   4   3      .     4.125     -4.125       -2.6
    "00P0wOQrpc"   3   4      .     4.125     -4.125       -2.6
    "00P0wOQrpc" 4.5   5      .     4.125     -4.125       -2.6
    "00QpTCKkvk"   4   1      4         .          4          4
    "00ROkedpRn"   4   1   4.25         .       4.25  -1.785714
    "00ROkedpRn" 4.5   2   4.25         .       4.25  -1.785714
    "00ROkedpRn"   4   3      .       4.2       -4.2  -1.785714
    "00ROkedpRn"   4   4      .       4.2       -4.2  -1.785714
    "00ROkedpRn"   4   5      .       4.2       -4.2  -1.785714
    "00ROkedpRn"   4   6      .       4.2       -4.2  -1.785714
    "00ROkedpRn"   5   7      .       4.2       -4.2  -1.785714
    "00SGfJeiLC"   5   1      5         .          5       -2.5
    "00SGfJeiLC"   5   2      .         5         -5       -2.5
    "00SGfJeiLC"   5   3      .         5         -5       -2.5
    "00SGfJeiLC"   5   4      .         5         -5       -2.5
    "00TAAjSt8W"   4   1      .  3.666667  -3.666667  -3.666667
    "00TAAjSt8W"   3   2      .  3.666667  -3.666667  -3.666667
    "00TAAjSt8W"   4   3      .  3.666667  -3.666667  -3.666667
    end

  • #2
    Hello Asteris! I'm going to make the assumption that since you've taken the average of the the rating, that we don't really need to keep it around. Also my code is in ado. The simplest method I've found is to:

    Code:
    collapse first_half_rating second_half_rating difference_of_halfs diff, by(userid)
    Now we've collapsed our observations, and can do some math. We need to take into account missing fields, so we'll use egen.

    Code:
    gen neg_second_half = -(second_half_rating)
    egen diff_of_halfs = rowtotal(first_half_rating neg_second_half)
    drop neg_second_half
    and now we have:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str17 userid float(first_half_rating second_half_rating difference_of_halfs diff diff_of_halfs)
    "001544VKwW"      5         .          5          5         5
    "007pm3BK9b"   3.75  3.727273 -.57894754 -.57894737 .02272701
    "009FmWLEeE"      5         .          5          5         5
    "00BLGIWzcJ" 3.8125 3.9545455  -.6842105  -.6842105 -.1420455
    "00D0EXepFD"    4.5       3.4 -1.1428572 -1.1428572 1.0999999
    "00FqmmCeLF"      .         5         -5         -5        -5
    "00GM0zPWrU"   4.25      4.25 -1.4166666 -1.4166666         0
    "00GRD0DXsA"    3.5         4     -2.125     -2.125       -.5
    "00I4u5rBjQ"      .      3.25      -3.25      -3.25     -3.25
    "00IonREp6n"      5         .          5          5         5
    "00LskfiGlk"      4         .          4          4         4
    "00M2bF7aW1"      .         5         -5         -5        -5
    "00M9ISXXrC"    2.5  4.142857 -1.7272726 -1.7272726 -1.642857
    "00OFXMYGwM"      .      4.75      -4.75      -4.75     -4.75
    "00OqHKXhlZ"      3         .          3          3         3
    "00P0wOQrpc"    3.5     4.125       -2.6       -2.6     -.625
    "00QpTCKkvk"      4         .          4          4         4
    "00ROkedpRn"   4.25       4.2  -1.785714  -1.785714 .05000019
    "00SGfJeiLC"      5         5       -2.5       -2.5         0
    "00TAAjSt8W"      .  3.666667  -3.666667  -3.666667 -3.666667
    end
    If you really wanted the rating and rank, you could save a tempfile with just those variables, then merge on onto this file.




    Comment


    • #3
      Try this:

      Code:
      by userid: egen first_half_rating = mean(rating/(half == 1))
      by userid: egen second_half_rating = mean(rating/(half == 2))
      gen diff = second_half_rating - first_half_rating
      Last edited by Joro Kolev; 14 Jul 2021, 06:27.

      Comment

      Working...
      X