Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Vars for each pair in data

    I am trying to automate the following where I am attempting to create a variable for each unique pair, where AB == BA, across the variables input*

    Code:
    clear
    input id date str3 input1 str3 input2 str3 input3 str3 input4
    1 18263   "A"  "B"  "C" "E"
    2 18264   "B"  "D"  "A"
    3 18264   "B"  "C"  "E"
    4 18265   "C"  "A"  "B"  "R"
    5 18267   "C"  "B"  "E"  "L"
    6 18268   "A"  
    7 18269   "E"  "C"  "E"
    8 18271   "R"  "D"
    9 18272   "B"  "R"  "D"
    10 1827   "B"  "L"   "A"
    11 18274  "R"  "A"  "C"
    end
    I first sort the data using the user command from https://www.stata-journal.com/sjpdf....iclenum=pr0046

    Code:
    rowsort input1-input4, generate(inputs_alpha1-inputs_alpha4) highmissing
    I can do it manually but was wondering if there is a more automated solution which will work better for larger datasets.

    Code:
    g grp1 = inputs_alpha1+inputs_alpha2
    g grp2 = inputs_alpha1+inputs_alpha3    
    g grp3 = inputs_alpha1+inputs_alpha4
    g grp4 = inputs_alpha2+inputs_alpha3
    g grp5 = inputs_alpha2+inputs_alpha4
    g grp6 = inputs_alpha3+inputs_alpha4
    Finally I want to see the frequencies these occur which is why I sorted in the first place

  • #2
    If I understand what you want, this will do it:

    Code:
    isid id date
    reshape long input, i(id date) j(j)
    drop if missing(input)
    preserve
    rename input alt_input
    rename j alt_j
    tempfile copy
    save `copy'
    
    restore
    joinby id date using `copy'
    drop if alt_input < input // DON'T DOUBLE COUNT AB WITH BA
    gen pair = input + alt_input
    keep id date pair
    
    //    IDENTIFY FREQUENCY OF OCCURRENCE OF PAIRS
    tab pair
    
    //    AND SHOW RESULTS FOR EACH ID-DATE COMBINATION
    by id date (pair), sort: gen _j = _n
    reshape wide pair, i(id date) j(_j)
    As with most thing in Stata, the analysis is much easier in long layout than in wide. In fact, unless you actually need the wide display of the pairs created in the last two lines of the above code, I suggest you skip those last two lines. Whatever analyses you want to do on these pairs will most likely be easier if you stick with the long layout. Wide layout is useful for visual displays and some types of graph, and a handful of other commands; but most things are easier in long layout.

    Comment


    • #3
      Clyde Schechter The only issue with your suggestion is that it is combining the focal input with itself, such as "AA" whereas an input should never be paired with itself

      Comment


      • #4
        OK, I thought you wanted to include those. But it's an easy fix. Just change -drop if alt_input < input- to -drop if alt_input <= input-.

        Comment


        • #5
          Clyde Schechter Thanks

          Comment

          Working...
          X