Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Descriptive Statistics by Segment

    Hi all,

    I have ratings data (ratings are 0-5) and I am trying to see what what % of users never rate below 3. This is the code that I have so far. Note that user_ratingscount is the the number of ratings left by a user at the time of each rating.

    PHP Code:
        sort userid 
        bysort userid
    gen rating_below_3 if rating 3
        replace rating_below_3 
    if rating <= 
        egen max 
    max(user_ratingscount), by(userid)
        
    egen sum3 total(rating_below_3), by(userid)
        
    bysort useridgen never_rated_below_3 if max == sum
        bysort userid
    replace never_rated_below_3 if max != sum 
        duplicates drop userid
    force 
        tab never_rated_below_3 
    I am running into trouble when a user has rated below a 3 but has never rated exactly a 3. The code above comes out as 1 in that case even though it should still report a 0 for the never_rated_below_3 variable. Here is a data example.


    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double(userid rating user_ratingscount) float(rating_below_3 max sum3 never_rated_below_3)
     2 4.5  4 . 12 0 0
     2   3  1 0 12 0 0
     2   3  9 0 12 0 0
     2   3  8 0 12 0 0
     2   4 11 . 12 0 0
     2 3.5  3 . 12 0 0
     2   4 10 . 12 0 0
     2 4.5  7 . 12 0 0
     2   5 12 . 12 0 0
     2   4  2 . 12 0 0
     2   4  6 . 12 0 0
     2   4  5 . 12 0 0
     3   5  1 .  1 0 0
     4 3.5  9 . 13 0 0
     4 3.5 11 . 13 0 0
     4   5  2 . 13 0 0
     4 3.5  3 . 13 0 0
     4   5  5 . 13 0 0
     4   5 12 . 13 0 0
     4   5 10 . 13 0 0
     4 3.5  4 . 13 0 0
     4 4.5  7 . 13 0 0
     4   5 13 . 13 0 0
     4   5  8 . 13 0 0
     4   5  1 . 13 0 0
     4   5  6 . 13 0 0
     5 2.5 32 0 38 0 0
     5   3  6 0 38 0 0
     5   5 20 . 38 0 0
     5   5 37 . 38 0 0
     5   5 35 . 38 0 0
     5   5 33 . 38 0 0
     5   5 25 . 38 0 0
     5   5  8 . 38 0 0
     5 4.5 18 . 38 0 0
     5   5 28 . 38 0 0
     5   5 13 . 38 0 0
     5   5 17 . 38 0 0
     5   5 31 . 38 0 0
     5 4.5 12 . 38 0 0
     5   5 34 . 38 0 0
     5   5  2 . 38 0 0
     5   5  7 . 38 0 0
     5   5 21 . 38 0 0
     5   4 29 . 38 0 0
     5   5  5 . 38 0 0
     5   5 30 . 38 0 0
     5   5 22 . 38 0 0
     5   5  4 . 38 0 0
     5   5 15 . 38 0 0
     5 4.5  1 . 38 0 0
     5   5 10 . 38 0 0
     5   5 19 . 38 0 0
     5   5 23 . 38 0 0
     5   4 27 . 38 0 0
     5   5 36 . 38 0 0
     5   5  3 . 38 0 0
     5   5 38 . 38 0 0
     5   4 16 . 38 0 0
     5   5 11 . 38 0 0
     5   5 14 . 38 0 0
     5   5  9 . 38 0 0
     5   5 24 . 38 0 0
     5   5 26 . 38 0 0
     6   5  8 .  8 0 0
     6   5  5 .  8 0 0
     6   4  6 .  8 0 0
     6   5  2 .  8 0 0
     6   4  3 .  8 0 0
     6   5  4 .  8 0 0
     6   3  7 0  8 0 0
     6   3  1 0  8 0 0
     7   5  1 .  1 0 0
     8   3  2 0  2 0 0
     8   4  1 .  2 0 0
     9 3.5  2 .  2 0 0
     9   2  1 0  2 0 0
    10 4.5  6 .  8 0 0
    10   5  5 .  8 0 0
    10   3  2 0  8 0 0
    10   5  8 .  8 0 0
    10 2.5  7 0  8 0 0
    10   5  1 .  8 0 0
    10   3  4 0  8 0 0
    10 4.5  3 .  8 0 0
    11   5 13 . 13 0 0
    11   5 12 . 13 0 0
    11 4.5  5 . 13 0 0
    11   5  6 . 13 0 0
    11   5  2 . 13 0 0
    11   5 11 . 13 0 0
    11   4 10 . 13 0 0
    11   4  4 . 13 0 0
    11   5  8 . 13 0 0
    11   5  1 . 13 0 0
    11 4.5  3 . 13 0 0
    11   5  7 . 13 0 0
    11 4.5  9 . 13 0 0
    12   4  7 . 14 0 0
    12 3.5  6 . 14 0 0
    end
    Does anyone have a more efficient way to complete the task or a possible edit to the code above? Thank you in advance for all the help.

  • #2
    Asteris:
    are you sure that the second line of your code should not be:
    Code:
    replace rating_below_3 = 0 if rating >= 3
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Hi. Thanks for pointing that typo out. Even after correcting it, though, the error I described in #1 persists. Do you have any insights on a more efficient way to complete the task?

      Best,

      Asteris

      Comment


      • #4
        Asteris:
        you may want to try:
        Code:
         bysort userid: gen wanted=1 if rating<3
        
        
        . collapse (count) wanted, by(userid) ///remember to save a copy of your dataset before -collapse-///
        
        . tab wanted
        
            (count) |
             wanted |      Freq.     Percent        Cum.
        ------------+-----------------------------------
                  0 |          8       72.73       72.73
                  1 |          3       27.27      100.00
        ------------+-----------------------------------
              Total |         11      100.00
        
        .
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          That worked! Thanks.

          Comment

          Working...
          X