Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to do a cross tabulation identifying common observations across different groups?

    My data are proprietary, so I will provide a simple mock-up that I hope makes the structure clear.
    custid product
    1 A
    2 A
    2 A
    3 A
    4 A
    2 B
    3 B
    3 B
    3 B
    1 C
    1 C
    1 C
    1 C
    2 C
    4 C
    4 C
    4 C
    4 C
    4 C
    custid is the customer identifier variable and product is, well, different products. I would like to identify which customers are common for all pairwise combinations of products. In my actual data there are hundreds of unique custid (and they can be repeated, absent, or omitted, within products) and 20 different products. I cannot, for the life of me, figure out how to do this using the various table commands and options. Any help would be appreciated.

  • #2
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte custid str1 product
    1 "A"
    2 "A"
    2 "A"
    3 "A"
    4 "A"
    2 "B"
    3 "B"
    3 "B"
    3 "B"
    1 "C"
    1 "C"
    1 "C"
    1 "C"
    2 "C"
    4 "C"
    4 "C"
    4 "C"
    4 "C"
    4 "C"
    end
    
    preserve
    rename product product2
    tempfile copy
    save `copy'
    restore
    joinby custid using `copy'
    drop if product == product2
    duplicates drop

    In the future, when showing data examples, please use the -dataex- command to do so, as I have here. If you are running version 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    When asking for help with code, always show example data. When showing example data, always use -dataex-.

    Added: It dawns on me that given the size of your data set you may face problems with memory. Much of that can be avoided by running -duplicates drop- first.

    Comment


    • #3
      joinby... that's the one I hadn't found. Thank you very much. I will use the dataex command next time, it does seem very useful.

      P.S. I did not run out of memory, but I take your point.

      Comment

      Working...
      X