Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Create new variable as result from matrix

    Dear Stata users,

    I have some trouble in creating a new variable as a result of a matrix.

    I am using the command -pco- in Stata 12.
    I want to calculate the euclidean distance between subjects (n) based on three attributes (x y z).
    The -pco- command reduces all attributes to two dimensions, and generates a matrix (n x n) where each score represents the euclidean distance between each of the subjects.
    Code:
    pco conscientiousness neuroticism openness extraversion agreeableness, id(code)
    matrix list r(D)
    Here is the obtained matrix, where the tags "codeXX" represents subjects and the values represent the distance between each pair of subjects:

    symmetric r(D)[8,8]
    code17 code18 code20 code19 code12 code14 code13 code16
    code17 0
    code18 6 0
    code20 12.84375 6.40625 0
    code19 10.03125 2.59375 3.5625 0
    code12 9.5763889 8.7638887 9.2951387 10.607639 0
    code14 3.46875 4.09375 7.25 7 3.0034724 0
    code13 8.71875 7.78125 3.375 6.9375 3.045139 2.375 0
    code16 3.96875 5.96875 9.875 6.6875 3.1701389 1.8125 4.125 0

    I would like to create a new variable reflecting the average distance between him/her and all other subjects. This value, of course, would be different for each subject. For instance, for "code 17", the score of this new variable would be computed as (6+12.84+10.03+9.57+3.46+8.71+3.96)/6

    Any help on this will be highly appreciated.

    Many thanks in advance,
    Oscar

  • #2
    Many ways to do this. Consider

    Code:
    // example matrix
    matrix input A = ( 1 2 3 \ 4 5 6 \ 7 8 9)
    
    // auxiliary matrix
    matrix input B = (1 \ 1 \ 1)
    
    // average elements of each row
    matrix define C = (A * B) / colsof(A)
    
    // average of elements of each column
    matrix define D = (A' * B) / colsof(A)
    
    // list
    matrix list A
    matrix list B
    matrix list C
    matrix list D
    Read -help mkmat- to go back and forth between matrices and variables. More generally, read -help matrix-.
    For more complicated matrix operations, you may check -help mata-.
    You should:

    1. Read the FAQ carefully.

    2. "Say exactly what you typed and exactly what Stata typed (or did) in response. N.B. exactly!"

    3. Describe your dataset. Use list to list data when you are doing so. Use input to type in your own dataset fragment that others can experiment with.

    4. Use the advanced editing options to appropriately format quotes, data, code and Stata output. The advanced options can be toggled on/off using the A button in the top right corner of the text editor.

    Comment


    • #3
      Hi Roberto,

      Thanks a lot! I'll read -help mkmat-.

      I followed your code and it works perfectly. But things are a bit more complicated...
      In my dataset, subjects are grouped into sub-groups. And sub-groups differ in size. That implies that I need to calculate the Euclidean distance indicator for each sub-group.
      To do this, I tried to use the command -pce- with the option -by(group)-, but it seems that the command does not support this option.
      Since sub-groups differ in size, (and following your suggested method) the auxiliary matrix that I have to create would also differ in size. Thus, I would need a code that, for each group, creates an auxiliary matrix of the same size as the number of subjects in each sub-group.

      Again, many thanks in advance for any advice.

      Oscar

      Comment


      • #4
        It's not clear how these subgroups appear in your matrix (matrices). How do you identify them? Is it one or several matrices?
        It's best if you offer a representative example of your data/matrices and base your explanation on that. See also the advice given below.
        You should:

        1. Read the FAQ carefully.

        2. "Say exactly what you typed and exactly what Stata typed (or did) in response. N.B. exactly!"

        3. Describe your dataset. Use list to list data when you are doing so. Use input to type in your own dataset fragment that others can experiment with.

        4. Use the advanced editing options to appropriately format quotes, data, code and Stata output. The advanced options can be toggled on/off using the A button in the top right corner of the text editor.

        Comment


        • #5
          Hi,

          Thanks for answering.

          I attach a snapshow where you can see how data is organized.

          Each row is a different individual. The variable "idip_new" is a unique code for each individual. The variable "group" identifies the group where each individual belongs. And then, I have five variables which are unique for each individual (the five personality traits). As you can see, groups differ in size. E.g. CIBERBBN1.1 has 7 members, CIBERBBN1.10 has 2 members, etc.

          I want to calculate the average euclidean distance between pairs of individuals that belong to the same group, based on the five personality traits. For each subject, the resulting variable would reflect the average euclidean distance between him/her and the rest of members within the same group.

          Problems: the command - pce - does not have the option of grouping observations by subgroup. Also note that the size of the output matrices differ depending on the size of each group (and also the auxiliary matrix that I have to input to get the average of the elements of each column). Any help on how to do it?

          Thanks!
          Attached Files

          Comment


          • #6
            Hi again,

            Just an update to clarify my inquiry .

            Basically, I need to loop this code through all subgroups in my dataset. The problem is that the -pce- command does not support neither the "if" option nor the by(group) option, so I don't know how to automatize the process.

            Code:
            pco conscientiousness neuroticism openness extraversion agreeableness, id(code)
            mat F = J(colsof(r(D)),1,1)
            matrix define G_`i' = (r(D)' * F) / colsof(r(D))
            matrix drop F
            svmat G_`i'
            Thanks

            Comment


            • #7
              pco (SJ) is documenting as supporting if (which is strictly a qualifier, not an option). Even if that's not true, you can just loop over groups by reading in the dataset and keeping what you want before calling pco.

              Perhaps pce is something different. Some of your posts refer to pco and some to pce

              Comment


              • #8
                Hi Nick,

                Thanks for answering. You are right, the command is -pco- (not pce), and it supports the -if- qualifier.
                Here's the code that I used:

                Code:
                summ num, d
                foreach i of num `r(min)'/`r(max)' {
                qui pco conscientiousness neuroticism openness extraversion agreeableness if num==`i' , id(code)
                mat F = J(colsof(r(D)),1,1)
                matrix define G_`i' = (r(D)' * F) / colsof(r(D))
                matrix drop F
                svmat G_`i'
                }
                The code creates a new variable for every loop (that is, 70 new vars). Instead, I would need a single variable that combines the result of every loop.
                Here's the snapshot:

                Click image for larger version

Name:	example.jpg
Views:	2
Size:	194.6 KB
ID:	1288030


                I need to fill the variable "euclidean_" by appending the observations from the variables ranging from G_11 to G_171.
                For instance, observation 1 in "euclidean_" would be 5.035714, observation 8 would be 2.39375, observation 13 would be 2.18125 and so on.

                I tried the following code but it does not give the desired result:

                Code:
                foreach var of varlist G_11-G_701 {
                replace euclidean_pers = `var' if [_n-1] !=0     
                }
                Any help?

                Thanks a lot.
                Attached Files

                Comment


                • #9
                  Your loop just overwrites the same variable repeatedly. It sounds as if you need something like

                  Code:
                  gen euclidean_pers = .
                  
                  quietly foreach v of varlist G_11-G_701 {
                         count if !missing(eucidean_pers)
                         local N1 = r(N)
                         local n1 = r(N) + 1
                         count if !missing(`v')
                         local n2  = `N1' + r(N)
                         replace euclidean_pers = `v'[_n - `N1'] in `n1'/`n2'
                  }
                  but I've not tested this. (Other methods based on stacking, merging, etc. should spring to mind.)
                  Last edited by Nick Cox; 09 Apr 2015, 09:48.

                  Comment

                  Working...
                  X