Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Data manipulation of list of pairwise comparisons

    Dear statlisters,

    I have a dataset comprising a list of pairwise comparisons of 98 occupations.

    Currently they are given as three variables:

    comparator1 (the name of the first comparator - e.g. Company CEO)
    comparator2 (the name of the second comparator - e.g. Elected official)
    prob1 (a probability associated with comparator1)

    Each comparison is listed twice, and has a different value for prob1.

    For example:

    In row 1, comparator1 is Company CEO, comparator2 is Elected official, and prob1 is 0.98
    In row 99, comparator1 is Elected official, comparator2 is Company CEO, and prob1 is 0.08

    I would like to change the data such that each contrast is only listed once, with a variable indicating the probability for each comparator.

    So:

    Row 1 would now, comparator1 is company CEO, comparator2 is elected official, prob1 is 0.98 and prob2 is 0.08.

    Is there a straightforward way to do this?




  • #2
    Some of the details of how to do this would depend on the coding of your comparator id variables, so please supply an example data set using (say) 3 pairs of observations. (See the StataList FAQ and -help dataex- regarding -dataex- if it's unfamiliar.) If your actual data can't be presented publicly, use the structure of your data and replace actual values with fake ones.

    Comment


    • #3
      You don't provide example data to work with, so I have created a toy data set that I believe is similar to yours to illustrate the approach. Modify the code accordingly if there are material differences.

      Code:
      // CREATE A DEMONSTRATION DATA SET
      clear*
      local jobs A B C D E
      local ntuples = 0
      foreach j1 of local jobs {
          foreach j2 of local jobs {
              if "`j1'" != "`j2'" {
                  local ++ntuples
                  local tuple`ntuples' `j1' `j2'
              }
          }
      }
      
      set obs `ntuples'
      gen pair = ""
      forvalues i = 1/`ntuples' {
          replace pair = "`tuple`i''" in `i'
      }
      split pair, gen(comparator)
      set seed 1234
      gen prob1 = runiform()
      drop pair
      
      // SOLUTION BEGINS HERE
      gen unordered_pair = cond(comparator1 < comparator2, comparator1+comparator2, ///
          comparator2 + comparator1)
          
      by unordered_pair (comparator1), sort: gen seq = _n
      assert inlist(seq, 1, 2)
      by unordered_pair (seq), sort: replace comparator1 = comparator1[1]
      by unordered_pair (seq): replace comparator2 = comparator2[1]
      rename prob1 prob
      
      reshape wide prob, i(unordered_pair) j(seq)
      drop unordered_pair
      order comparator1 comparator2, first
      In the future, to avoid guesswork about the data, always show example data when asking for help with code. And remember that the helpful way to show data is with the -dataex- command. If you are running version 18, 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

      Added: Crossed with #2.

      Comment


      • #4
        Thanks so much Clyde. That worked perfectly! I'd had the idea to generate concatenated strings forwards and backwards, but hadn't twigged to use cond, or really got to what to do after that.

        And thanks for going the extra mile to create the toy dataset (which did indeed perfectly match my situation). Next time I'll be sure to use dataex.

        Comment

        Working...
        X