Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • creating a cluster variable

    Hello,

    I have the following matrix:

    Code:
     mat li e
    
    e[12,12]
          c1   c2   c3   c4   c5   c6   c7   c8   c9  c10  c11  c12
     r1    0    0    0    6    7    5   90    4    0    0    0    0
     r2    0    0   89    6   11    9    9   14   14    0   53    9
     r3    0    0    0    0    0   11    0    0    0   13    0    0
     r4   20    0    5    0    9    0    2    0   10    7    0   33
     r5    2    0   82    0    0   22   16    3    0    0    0    6
     r6    0    0    1    0    9    0    0    0    0   25    0    0
     r7    0    9    0   11    0    0    0   71    0    0    0   13
     r8    0    0   11    2    0    0    0    0    0    0    0    0
     r9    4   33    0    4    2   16    0   13    0    0   25   15
    r10    0    0    0    0    0    0    0    0    0    0   31    0
    r11    0    0   87    0    0    0    0   55    0    0    0  100
    r12    5    3    0    9    7    2    0    0    2    0    0    0
    Where the rows and columns of the same number represent the same store (Store 1, store 2, etc). The numbers on the matrix represent the amount of times in a year they've purchased from each other.

    I would like to create a variables that creates a group, like a cluster, of the observations that are "similar". That is, instead of doing one myself by looking at it, creating something that's more statistically robust I guess.

    Is there a way to do this?

    Naturally, I know that c1,r2 represents the impact of store 1 on store 2's sales, and c2,r1 represents the impact of store 2 on store 1's sales, and this makes things difficult. Maybe I could do like an average of the 2, just to have like half a matrix. I just want the mechanics of this, or even if it's possible, in Stata.

    Thank you in advance,
    Anthoine
    Last edited by Anthoine Saunders; 27 Nov 2020, 12:56.

  • #2
    It's not clear what you mean by "observation" - is each cell value an observation? And by 'similar', do you mean the same value? Or among the same stores?

    Comment


    • #3
      Hi Jeph,

      Thank you for your answers, and my apologies for the lack of clarification.

      By an observation I mean a cell of the matrix, yes. And by similar I mean within the same range, maybe something like if stores are related to each other like if the matrix cells are in an interval from [10,30[ or [70,90[.

      I didn't want to define the groups myself because I wanted to see what Stata would do if it were grouping on its own. A cluster by definition should have values with "small distances" between them, but "large distances" across "groups", and maybe by doing it by hand I could misinterpret this, and I wanted something sinewy. I wanted to check if Stata would find a 2 or 3 or 5 groups.

      Maybe this is not the most interesting matrix to be analysed in this case because of the amount of 0's, but I am doing this for several others (other collections of stores in another regions). I just wanted to know if this was possible at all and what sort of methodology I could use. I leave here another in case it helps with anything.

      Code:
       mat li ak
      
      ak[12,12]
            c1   c2   c3   c4   c5   c6   c7   c8   c9  c10  c11  c12
       r1    1    0    2   64    7   87   33   29    0    0    0    0
       r2  100    1    0   77   60   36   78   62   16    0    0   96
       r3    0    0    1    0    0   22    0    0    0   31    0    0
       r4   49    0   11    1   51    0    2    0    0   82    0    0
       r5    7  100   62    0    1   56   84   40  100    0    0   76
       r6    0    0   88    0   49    1    0    0    0   29    0    0
       r7  100   47    2   80  100  100    1   87   80    0    0   67
       r8   33    0   71   16    0  100    0    1  100    0    0    0
       r9   40    0    0    0   96   11    0   31    1    0   53   47
      r10    0    0  100    0    0    0    0    0    0    1  100    0
      r11    0    0    0    0    0    0    0    0    0    0    1    0
      r12   93   97    0   24   40   27    0  100   47    0    0    1
      I'm not even aware if this is something doable on matrix form or if I have to transform these results in a .dta type of setting and then create the "average" sales between the 2 stores (c1,r2 represents the impact of store 1 on store 2's sales, and c2,r1 represents the impact of store 2 on store 1's sales, like I said before).

      Thank you once again

      Comment


      • #4
        Hello,

        I am still having difficulties with this, and it's really important to my project so any help you could give me would be great.

        Thank you

        Comment


        • #5
          It seems likely that the tools you need will be found somewhere in the Stata Multivariate Statistics Reference Manual PDF included with your Stata installation and accessible through Stata's Help menu.
          I lack experience with those procedures. What I can do is
          1. present your example data from post #3 in a form easily read into a Stata dataset by others
          2. show you how to transform a Stata dataset into a matrix
          3. show you how to transform a matrix into a Stata dataset, and reshape it to a more useful layout
          With the the presentation of your matrix in a format usable by others in answering your question, my hope is that someone with the appropriate expertise will take the next step and recommend an approach to your problem.
          Code:
          /* 1. present your example data from post #3 in a form easily read into a Stata dataset */
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input str3 r int (c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 c11 c12)
          "r1"    1   0   2 64   7  87 33  29   0  0   0  0
          "r2"  100   1   0 77  60  36 78  62  16  0   0 96
          "r3"    0   0   1  0   0  22  0   0   0 31   0  0
          "r4"   49   0  11  1  51   0  2   0   0 82   0  0
          "r5"    7 100  62  0   1  56 84  40 100  0   0 76
          "r6"    0   0  88  0  49   1  0   0   0 29   0  0
          "r7"  100  47   2 80 100 100  1  87  80  0   0 67
          "r8"   33   0  71 16   0 100  0   1 100  0   0  0
          "r9"   40   0   0  0  96  11  0  31   1  0  53 47
          "r10"   0   0 100  0   0   0  0   0   0  1 100  0
          "r11"   0   0   0  0   0   0  0   0   0  0   1  0
          "r12"  93  97   0 24  40  27  0 100  47  0   0  1
          end
          /* 2. show how to transform a Stata dataset into a matrix */
          mkmat c1-c12, matrix(ak)
          matrix list ak
          /* 3. show how to transform a matrix into a Stata dataset,
                and reshape it to a more useful layout */
          clear
          svmat ak, names(col)
          generate row = _n
          order row
          list in 1/5, clean noobs
          reshape long c, i(row) j(col)
          list in 1/5, clean noobs
          From the end of part 3
          Code:
          . list in 1/5, clean noobs
          
              row    c1    c2   c3   c4   c5   c6   c7   c8    c9   c10   c11   c12  
                1     1     0    2   64    7   87   33   29     0     0     0     0  
                2   100     1    0   77   60   36   78   62    16     0     0    96  
                3     0     0    1    0    0   22    0    0     0    31     0     0  
                4    49     0   11    1   51    0    2    0     0    82     0     0  
                5     7   100   62    0    1   56   84   40   100     0     0    76  
          
          . reshape long c, i(row) j(col)
          (note: j = 1 2 3 4 5 6 7 8 9 10 11 12)
          
          Data                               wide   ->   long
          -----------------------------------------------------------------------------
          Number of obs.                       12   ->     144
          Number of variables                  13   ->       3
          j variable (12 values)                    ->   col
          xij variables:
                                    c1 c2 ... c12   ->   c
          -----------------------------------------------------------------------------
          
          . list in 1/5, clean noobs
          
              row   col    c  
                1     1    1  
                1     2    0  
                1     3    2  
                1     4   64  
                1     5    7

          Comment


          • #6
            William, I appreciate immensely. Any help is good help. I'll wait patiently to see if anyone else knows how to proceed.

            Thank you once again

            Comment

            Working...
            X