Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Equivalent Proportions

    The following data represents two people who reviewed 100 unique t-shirts (uniqueid) and ranked whether each shirt is among their 25 most favourite shirts (rank1top25 for person 1 and rank1top25 for person 2).

    Code:
    clear
    input uniqueid    blue    rank1top25    rank2top25
    1    0    0    0
    2    0    0    0
    3    1    0    0
    4    1    0    0
    5    0    0    0
    6    0    0    1
    7    1    0    1
    8    0    1    1
    9    0    1    0
    10    0    0    0
    11    0    0    0
    12    0    0    0
    13    1    0    1
    14    1    0    1
    15    0    0    0
    16    0    0    0
    17    0    1    0
    18    1    0    0
    19    0    1    1
    20    1    0    0
    21    0    0    0
    22    0    1    0
    23    0    0    1
    24    1    0    0
    25    0    1    0
    26    1    0    0
    27    1    0    0
    28    0    0    0
    29    1    0    1
    30    0    1    0
    31    0    0    0
    32    0    0    0
    33    0    1    1
    34    0    1    0
    35    0    0    0
    36    0    1    0
    37    0    0    1
    38    1    0    1
    39    0    0    0
    40    0    0    1
    41    0    0    1
    42    0    0    0
    43    0    0    0
    44    0    0    1
    45    0    0    1
    46    0    0    0
    47    0    0    0
    48    1    0    0
    49    1    0    0
    50    0    1    1
    51    0    1    0
    52    0    0    1
    53    0    0    1
    54    1    1    0
    55    1    0    1
    56    0    0    0
    57    0    0    0
    58    0    0    1
    59    0    0    0
    60    1    1    0
    61    1    1    0
    62    0    0    0
    63    0    1    0
    64    1    0    0
    65    0    1    0
    66    0    0    0
    67    0    0    0
    68    1    0    0
    69    0    1    0
    70    1    1    0
    71    0    1    0
    72    0    0    0
    73    0    1    0
    74    0    0    1
    75    0    0    0
    76    0    0    1
    77    1    0    0
    78    0    0    0
    79    1    0    0
    80    0    0    0
    81    0    1    1
    82    0    0    0
    83    1    0    0
    84    0    0    0
    85    0    0    0
    86    1    0    0
    87    0    0    1
    88    0    0    1
    89    0    1    0
    90    1    0    0
    91    1    0    0
    92    0    1    0
    93    0    0    0
    94    1    0    0
    95    0    0    0
    96    0    0    0
    97    0    0    0
    98    1    0    0
    99    0    0    0
    100    0    1    0
    end
    I want to test whether the proportion of blue shirts (blue) is the same in their top 25 ranking (rank1top25 versus rank2top25).

    I then want to test whether the distribution of the unique shirts (uniqueid) is the same in their top 25 ranking (rank1top25 versus rank2top25).

    I am not sure how to conduct the first analysis but thought maybe a chi-square for the second. But I am not sure.

  • #2
    Actually, I am less sure what to do with regard to your second question. In fact, I don't think I even understand what it means.

    But the first question, if I understand it, asks whether or not the two raters have different strengths of association between blueness and top 25 ranking. The ratings are given in matched pairs. So I would do this as:
    Code:
    reshape long rank@top25, i(uniqueid) j(rater)
    xtset uniqueid
    xtlogit ranktop25 i.rater##i.blue, fe
    In the example data, this produces:
    Code:
    Iteration 0:  Log likelihood = -27.457816  
    Iteration 1:  Log likelihood = -27.457816  
    
    Conditional fixed-effects logistic regression        Number of obs    =     80
    Group variable: uniqueid                             Number of groups =     40
    
                                                         Obs per group:
                                                                      min =      2
                                                                      avg =    2.0
                                                                      max =      2
    
                                                         LR chi2(2)       =   0.54
    Log likelihood = -27.457816                          Prob > chi2      = 0.7649
    
    ------------------------------------------------------------------------------
       ranktop25 | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
         2.rater |  -.1335314   .3659625    -0.36   0.715    -.8508048     .583742
          1.blue |          0  (omitted)
                 |
      rater#blue |
            2 1  |   .5389965    .742021     0.73   0.468     -.915338    1.993331
    ------------------------------------------------------------------------------
    The interaction coefficient is 0.54, with a 95% CI from -0.92 to +1.99. So I would say that there is little to no support in this data for the idea that the raters differ in their tendency to rank shirts in the top 25 when they are blue.

    That said, why are you using top 25 rank instead of using the ranks themselves? Is there some real-world material difference between being ranked in the top 25 and not? Is the 24th ranked shirt radically different for your purposes from the 26th ranked shirt, but equivalent to the top ranked shirt? If not, then all you are doing is throwing out information and adding noise to your data. Try to kick the dichotomization habit--it's a bad practice in most circumstances.

    Comment


    • #3
      Thank you Clyde. I think you understand my question well but to restate. I am asking whether the association between blue and top 25 is the same for both raters.

      I used a binary variable because the raters knew that their top 25 shirts would be printed and marketed to other customers. We took their rating (1-100) for each shirt and constructed a top 25 list that you see above. How would using the ratings change the analysis (data below)?

      Code:
       clear input uniqueid    blue    rater1    rater2    rank1top25    rank2top25
      1    0    52    69    0    1
      2    1    18    46    0    0
      3    0    70    28    1    0
      4    0    62    40    0    0
      5    0    12    18    0    0
      6    1    37    35    0    0
      7    0    94    97    1    1
      8    1    14    57    0    0
      9    1    24    100    0    1
      10    1    82    94    1    1
      11    1    92    65    1    0
      12    0    61    56    0    0
      13    0    91    45    1    0
      14    0    48    40    0    0
      15    0    1    58    0    0
      16    0    56    17    0    0
      17    0    69    30    0    0
      18    0    73    54    1    0
      19    1    87    63    1    0
      20    0    30    2    0    0
      21    0    25    44    0    0
      22    0    31    1    0    0
      23    0    63    33    0    0
      24    0    63    89    0    1
      25    0    21    7    0    0
      26    0    69    40    0    0
      27    0    44    45    0    0
      28    0    26    39    0    0
      29    0    77    15    1    0
      30    1    4    36    0    0
      31    1    37    23    0    0
      32    0    42    74    0    1
      33    1    39    83    0    1
      34    1    58    16    0    0
      35    0    52    93    0    1
      36    0    67    8    0    0
      37    1    0    60    0    0
      38    1    54    62    0    0
      39    0    34    79    0    1
      40    0    6    26    0    0
      41    0    45    80    0    1
      42    0    74    27    1    0
      43    1    83    86    1    1
      44    0    26    66    0    0
      45    0    42    17    0    0
      46    1    63    90    0    1
      47    0    32    67    0    0
      48    0    80    32    1    0
      49    0    27    39    0    0
      50    0    79    15    1    0
      51    0    54    59    0    0
      52    0    16    59    0    0
      53    1    51    27    0    0
      54    0    27    45    0    0
      55    1    52    38    0    0
      56    1    35    44    0    0
      57    0    71    90    1    1
      58    0    55    62    0    0
      59    0    39    51    0    0
      60    0    66    33    0    0
      61    0    31    9    0    0
      62    1    8    59    0    0
      63    0    90    19    1    0
      64    0    69    30    0    0
      65    0    19    40    0    0
      66    0    40    89    0    1
      67    0    90    90    1    1
      68    1    45    78    0    1
      69    0    9    32    0    0
      70    0    45    88    0    1
      71    0    10    15    0    0
      72    1    5    65    0    0
      73    1    17    40    0    0
      74    0    13    48    0    0
      75    0    11    44    0    0
      76    0    97    18    1    0
      77    0    37    5    0    0
      78    0    56    67    0    0
      79    0    90    45    1    0
      80    0    45    15    0    0
      81    0    97    39    1    0
      82    1    88    86    1    1
      83    1    58    44    0    0
      84    1    14    23    0    0
      85    1    43    93    0    1
      86    0    86    25    1    0
      87    1    78    83    1    1
      88    0    15    90    0    1
      89    0    45    89    0    1
      90    0    96    49    1    0
      91    0    38    6    0    0
      92    0    51    41    0    0
      93    0    58    21    0    0
      94    0    41    82    0    1
      95    0    43    98    0    1
      96    1    40    3    0    0
      97    1    87    4    1    0
      98    0    3    10    0    0
      99    1    94    14    1    0
      100    1    2    92    0    1
      end
      For the second question. I am asking whether the distribution of the unique shirts they chose is similar. For example, they both chose shirt 19 and 50. Is this overlap significantly similar or not?

      Comment


      • #4
        Given that the top 25 shirts would be printed and marketed, that is a good reason for using the dichotomized version. Nevertheless, just for the sake of completeness, here is how I would go about analyzing the ranks themselves if there were not a good reason to dichotomize:
        Code:
        drop rank*top25
        rename rater* rating*
        reshape long rating, i(uniqueid) j(rater)
        xtset uniqueid
        xtreg rating i.blue##i.rater, fe
        So the best I can make of the second question is you are asking about inter-rater agreement. I would probably use Cohen's kappa for that. In the original data (not -reshape-d):
        Code:
        kap rank*top25

        Comment


        • #5
          Thank you Clyde. Sticking with the binary variable. Does the first analysis answer the question of whether the proportion of blue shirts for rater 1 is the same as rater 2?

          Comment


          • #6
            Yes, it does.

            Comment

            Working...
            X