creating a cluster variable

Anthoine Saunders

Join Date: Oct 2020

Posts: 6
#1

creating a cluster variable

27 Nov 2020, 12:50

Hello,

I have the following matrix:

Code:

mat li e e[12,12] c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 c11 c12 r1 0 0 0 6 7 5 90 4 0 0 0 0 r2 0 0 89 6 11 9 9 14 14 0 53 9 r3 0 0 0 0 0 11 0 0 0 13 0 0 r4 20 0 5 0 9 0 2 0 10 7 0 33 r5 2 0 82 0 0 22 16 3 0 0 0 6 r6 0 0 1 0 9 0 0 0 0 25 0 0 r7 0 9 0 11 0 0 0 71 0 0 0 13 r8 0 0 11 2 0 0 0 0 0 0 0 0 r9 4 33 0 4 2 16 0 13 0 0 25 15 r10 0 0 0 0 0 0 0 0 0 0 31 0 r11 0 0 87 0 0 0 0 55 0 0 0 100 r12 5 3 0 9 7 2 0 0 2 0 0 0

Where the rows and columns of the same number represent the same store (Store 1, store 2, etc). The numbers on the matrix represent the amount of times in a year they've purchased from each other.

I would like to create a variables that creates a group, like a cluster, of the observations that are "similar". That is, instead of doing one myself by looking at it, creating something that's more statistically robust I guess.

Is there a way to do this?

Naturally, I know that c1,r2 represents the impact of store 1 on store 2's sales, and c2,r1 represents the impact of store 2 on store 1's sales, and this makes things difficult. Maybe I could do like an average of the 2, just to have like half a matrix. I just want the mechanics of this, or even if it's possible, in Stata.

Thank you in advance,
Anthoine

Last edited by Anthoine Saunders; 27 Nov 2020, 12:56.
Tags: cluster, generating variables, grouping post hoc, matrix
Jeph Herrin

Join Date: Apr 2014

Posts: 335
#2

27 Nov 2020, 14:10

It's not clear what you mean by "observation" - is each cell value an observation? And by 'similar', do you mean the same value? Or among the same stores?
Comment
Anthoine Saunders

Join Date: Oct 2020

Posts: 6
#3

27 Nov 2020, 14:32

Hi Jeph,

Thank you for your answers, and my apologies for the lack of clarification.

By an observation I mean a cell of the matrix, yes. And by similar I mean within the same range, maybe something like if stores are related to each other like if the matrix cells are in an interval from [10,30[ or [70,90[.

I didn't want to define the groups myself because I wanted to see what Stata would do if it were grouping on its own. A cluster by definition should have values with "small distances" between them, but "large distances" across "groups", and maybe by doing it by hand I could misinterpret this, and I wanted something sinewy. I wanted to check if Stata would find a 2 or 3 or 5 groups.

Maybe this is not the most interesting matrix to be analysed in this case because of the amount of 0's, but I am doing this for several others (other collections of stores in another regions). I just wanted to know if this was possible at all and what sort of methodology I could use. I leave here another in case it helps with anything.

Code:

mat li ak ak[12,12] c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 c11 c12 r1 1 0 2 64 7 87 33 29 0 0 0 0 r2 100 1 0 77 60 36 78 62 16 0 0 96 r3 0 0 1 0 0 22 0 0 0 31 0 0 r4 49 0 11 1 51 0 2 0 0 82 0 0 r5 7 100 62 0 1 56 84 40 100 0 0 76 r6 0 0 88 0 49 1 0 0 0 29 0 0 r7 100 47 2 80 100 100 1 87 80 0 0 67 r8 33 0 71 16 0 100 0 1 100 0 0 0 r9 40 0 0 0 96 11 0 31 1 0 53 47 r10 0 0 100 0 0 0 0 0 0 1 100 0 r11 0 0 0 0 0 0 0 0 0 0 1 0 r12 93 97 0 24 40 27 0 100 47 0 0 1

I'm not even aware if this is something doable on matrix form or if I have to transform these results in a .dta type of setting and then create the "average" sales between the 2 stores (c1,r2 represents the impact of store 1 on store 2's sales, and c2,r1 represents the impact of store 2 on store 1's sales, like I said before).

Thank you once again
Comment
Anthoine Saunders

Join Date: Oct 2020

Posts: 6
#4

29 Nov 2020, 05:35

Hello,

I am still having difficulties with this, and it's really important to my project so any help you could give me would be great.

Thank you
Comment

William Lisowski

Join Date: Dec 2014
Posts: 10150

29 Nov 2020, 09:03

It seems likely that the tools you need will be found somewhere in the Stata Multivariate Statistics Reference Manual PDF included with your Stata installation and accessible through Stata's Help menu.
I lack experience with those procedures. What I can do is

present your example data from post #3 in a form easily read into a Stata dataset by others
show you how to transform a Stata dataset into a matrix
show you how to transform a matrix into a Stata dataset, and reshape it to a more useful layout

With the the presentation of your matrix in a format usable by others in answering your question, my hope is that someone with the appropriate expertise will take the next step and recommend an approach to your problem.

Code:

/* 1. present your example data from post #3 in a form easily read into a Stata dataset */
* Example generated by -dataex-. To install: ssc install dataex
clear
input str3 r int (c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 c11 c12)
"r1"    1   0   2 64   7  87 33  29   0  0   0  0
"r2"  100   1   0 77  60  36 78  62  16  0   0 96
"r3"    0   0   1  0   0  22  0   0   0 31   0  0
"r4"   49   0  11  1  51   0  2   0   0 82   0  0
"r5"    7 100  62  0   1  56 84  40 100  0   0 76
"r6"    0   0  88  0  49   1  0   0   0 29   0  0
"r7"  100  47   2 80 100 100  1  87  80  0   0 67
"r8"   33   0  71 16   0 100  0   1 100  0   0  0
"r9"   40   0   0  0  96  11  0  31   1  0  53 47
"r10"   0   0 100  0   0   0  0   0   0  1 100  0
"r11"   0   0   0  0   0   0  0   0   0  0   1  0
"r12"  93  97   0 24  40  27  0 100  47  0   0  1
end
/* 2. show how to transform a Stata dataset into a matrix */
mkmat c1-c12, matrix(ak)
matrix list ak
/* 3. show how to transform a matrix into a Stata dataset,
      and reshape it to a more useful layout */
clear
svmat ak, names(col)
generate row = _n
order row
list in 1/5, clean noobs
reshape long c, i(row) j(col)
list in 1/5, clean noobs

From the end of part 3

Code:

. list in 1/5, clean noobs

    row    c1    c2   c3   c4   c5   c6   c7   c8    c9   c10   c11   c12  
      1     1     0    2   64    7   87   33   29     0     0     0     0  
      2   100     1    0   77   60   36   78   62    16     0     0    96  
      3     0     0    1    0    0   22    0    0     0    31     0     0  
      4    49     0   11    1   51    0    2    0     0    82     0     0  
      5     7   100   62    0    1   56   84   40   100     0     0    76  

. reshape long c, i(row) j(col)
(note: j = 1 2 3 4 5 6 7 8 9 10 11 12)

Data                               wide   ->   long
-----------------------------------------------------------------------------
Number of obs.                       12   ->     144
Number of variables                  13   ->       3
j variable (12 values)                    ->   col
xij variables:
                          c1 c2 ... c12   ->   c
-----------------------------------------------------------------------------

. list in 1/5, clean noobs

    row   col    c  
      1     1    1  
      1     2    0  
      1     3    2  
      1     4   64  
      1     5    7

Comment

Anthoine Saunders

Join Date: Oct 2020

Posts: 6
#6

29 Nov 2020, 12:41

William, I appreciate immensely. Any help is good help. I'll wait patiently to see if anyone else knows how to proceed.

Thank you once again
Comment

Announcement