How to do a cross tabulation identifying common observations across different groups?

Mike Visser

Join Date: Feb 2020

Posts: 2
#1

How to do a cross tabulation identifying common observations across different groups?

17 Feb 2020, 14:40

My data are proprietary, so I will provide a simple mock-up that I hope makes the structure clear.
custid product

1 A

2 A

2 A

3 A

4 A

2 B

3 B

3 B

3 B

1 C

1 C

1 C

1 C

2 C

4 C

4 C

4 C

4 C

4 C

custid is the customer identifier variable and product is, well, different products. I would like to identify which customers are common for all pairwise combinations of products. In my actual data there are hundreds of unique custid (and they can be repeated, absent, or omitted, within products) and 20 different products. I cannot, for the life of me, figure out how to do this using the various table commands and options. Any help would be appreciated.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30122
#2

17 Feb 2020, 14:49

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input byte custid str1 product 1 "A" 2 "A" 2 "A" 3 "A" 4 "A" 2 "B" 3 "B" 3 "B" 3 "B" 1 "C" 1 "C" 1 "C" 1 "C" 2 "C" 4 "C" 4 "C" 4 "C" 4 "C" 4 "C" end preserve rename product product2 tempfile copy save `copy' restore joinby custid using `copy' drop if product == product2 duplicates drop

In the future, when showing data examples, please use the -dataex- command to do so, as I have here. If you are running version 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

When asking for help with code, always show example data. When showing example data, always use -dataex-.

Added: It dawns on me that given the size of your data set you may face problems with memory. Much of that can be avoided by running -duplicates drop- first.
Comment
Mike Visser

Join Date: Feb 2020

Posts: 2
#3

17 Feb 2020, 15:04

joinby... that's the one I hadn't found. Thank you very much. I will use the dataex command next time, it does seem very useful.

P.S. I did not run out of memory, but I take your point.
Comment

custid	product
1	A
2	A
2	A
3	A
4	A
2	B
3	B
3	B
3	B
1	C
1	C
1	C
1	C
2	C
4	C
4	C
4	C
4	C
4	C

Announcement

How to do a cross tabulation identifying common observations across different groups?

Comment

Comment