Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Remove row from matrix (not: Mata)

    Dear all,

    I have a simple matrix (not in Mata), and I want to remove some rows (say, for all sel = 2). It looks simple, however, I cannot find how to do this.

    PHP Code:
    matrix list D

    D
    [16,1]
            
    type
       ams     1
       ein     1
       rtm     1
       grq     1
       mst     1
       bru     2
       crl     2
       dus     2
       nrn     2
    adamcs     3
    rdamcs     3
     utrcs     3
    arnhcs     3
    antwcs     3
     akhbf     3
    luikcs     3 

    Thanks, Kees

  • #2
    Kees: This is cheating using Mata, but something like this should work
    Code:
    mata st_matrix("Dnew",select(st_matrix("D"),st_matrix("D")[.,2]:~=2))
    where Dnew is your new matrix, D is your original matrix, and the selection is on its second column not equal to 2.

    Comment


    • #3
      P.S. There may be a simpler way using Stata's matrix language, but I'm not familiar with such.

      Comment


      • #4
        John: I don't think that will quite work. Kees has a 16 x 1 matrix (column vector) with row names, so your code needs to work on the single column.

        I don't think that the row names will survive unscathed.

        It's been a while, but matselrc frrom STB-56 should help.

        Comment


        • #5
          Apologies Kees and Nick. In haste I treated the display as a 16x2 matrix, but even then what I suggested won't work (mixed string and numeric...ugh).

          Nick is correct: The row names don't survive the selection. There's probably a code-intensive way to do this in Mata, but Nick's idea should expedite matters. Again, apologies.

          Comment


          • #6
            I don't have a good answer, because although

            Code:
            matselrc D E, rows(1/5 10/16) 
            works in this case, that is not a selection based directly on which rows have value 2.

            In principle the problem is perfectly programmable as a Stata program writing a version of Mata's select() but writing extensions to Stata''s matrix language peaked about 20 years ago!

            A deeper question is where this matrix comes from and what it is being used for, and so whether there is a better way of working altogether.
            Last edited by Nick Cox; 28 Sep 2017, 17:41.

            Comment


            • #7
              I concur with Nick Cox that working in Mata using Mata's select() would seem to make things easier (but perhaps I'm biased since I work with Mata a lot). However—and in the spirit of making amends for my earlier faux pas—it is possible to use Mata from the Stata command line and get the Stata matrix you want (I think) using something like this:
              Code:
              mata st_matrix("Dnew",select(st_matrix("D"),st_matrix("D")[.,1]:~=2))
              mata st_matrixrowstripe("Dnew",select(st_matrixrowstripe("D"),st_matrix("D")[.,1]:~=2))
              The second line of code restores the proper row names into your new matrix Dnew that are obliterated by the operations in the first line. For example
              Code:
              . matrix D=1\1\2\2\3\3\4\4
              
              . matrix list D
              
              D[8,1]
                  c1
              r1   1
              r2   1
              r3   2
              r4   2
              r5   3
              r6   3
              r7   4
              r8   4
              
              . matrix rowname D =row1 row2 row3 row4 row5 row6 row7 row8
              
              . matrix list D
              
              D[8,1]
                    c1
              row1   1
              row2   1
              row3   2
              row4   2
              row5   3
              row6   3
              row7   4
              row8   4
              
              . mata st_matrix("Dnew",select(st_matrix("D"),st_matrix("D")[.,1]:~=2))
              
              . mata st_matrixrowstripe("Dnew",select(st_matrixrowstripe("D"),st_matrix("D")[.,1]:~=2))
              
              . matrix list Dnew
              
              Dnew[6,1]
                    c1
              row1   1
              row2   1
              row5   3
              row6   3
              row7   4
              row8   4

              Comment


              • #8
                Actually, an all-Stata solution is possible. It's not elegant, but neither is it awful.

                Code:
                //    CREATE MATRIX FOR DEMONSTRATION
                matrix D = (1 \ 1 \1 \1 \2 \2 \2 \2 \3 \3 \3 \3 \3 \3 \3 \3)
                matrix rownames D = ams ein rtm grq mst bru crl dus nrn adamcs ///
                rdamcs utrcs arnhcs antwcs akhbf luikcs
                matrix colnames D = type
                
                matrix list D
                
                //    MOVE THE MATRIX INTO STATA DATA
                drop _all
                local rnames: rownames D
                svmat D, names(col)
                gen name = ""
                forvalues i = 1/`:rowsof D' {
                    replace name = `"`:word `i' of `rnames''"' in `i'
                }
                
                //    APPLY SELECTION CRITERION TO REMOVE SOME ROWS (NOW OBSERVATIONS)
                drop if type == 2
                
                //    RE-CREATE THE MATRIX
                mkmat type, matrix(D) rownames(name)
                matrix list D
                Note: Since this requires clearing data from active memory, wrap this between -preserve- and -restore- if that data is not dispensable at the time.
                Last edited by Clyde Schechter; 28 Sep 2017, 22:10. Reason: Code originally posted did not correctly handle the column name of the matrix; this has been fixed.

                Comment


                • #9
                  Following Nick and John, I too would use mata.

                  This can be done, however, solely with Stata if you wish to do that.

                  Code:
                  * Create demo matrix A.
                  mat A = 1\2\3\4\5
                  matlist A
                  mat colnames A = myvar
                  mat rownames A = One Two Three Four Five
                  matlist A
                  Code:
                  * Get row names and row count from matrix A.
                  local rnames : rownames A
                  local numwords: word count `rnames'
                  
                  * Set obs to row count if number of matrix rows > current number of obs in data set.
                  if (`numwords' > _N) set obs `numwords'
                  
                  * Convert rownames of matrix A into string variable MyName.
                  cap drop MyName
                  gen str5 MyName = " "
                  forval i = 1/`numwords' {
                    replace MyName = word("`rnames'", `i') in `i'
                    }
                  
                  * Convert vector of matrix B to numeric variable MyVar.
                  svmat A, names(MyVar)
                  
                  * Drop selected cases from MyVar.
                  drop if MyVar == 3
                  list _all
                  
                  * Create new matrix B
                  mkmat MyVar, matrix(B) rownames(MyName)
                  
                  * List new matrix B after removal of rows.
                  matlist B
                  
                  * Drop MyName and MyVar that are no longer needed.
                  drop MyName MyVar
                  Red Owl
                  Stata/IC 15.0


                  Edit: Crossed with Clyde's shorter Stata-only solution.

                  Comment


                  • #10
                    Thank you all. As you already indicated, Mata is probably the best environment for matrices. However, "the deeper question" (Nick) behind using a matrix is actually a detour: I use the matrix as a database because I can not open two data files at the same time. So for such a simple thing, Stata seemed adequate.

                    Remarkably, the solutions of Clyde and Red Owl both use a data file as a detour. Because I want to keep my database open, I follow Clyde's the preserve/restore suggestion.

                    Clyde: `: rowsof D' gives an error, so I used `nrows' by using local nrows = rowsof(D).

                    Comment


                    • #11
                      To add to my last post: it is a pity that Stata does not allow multiple dta files to open at once. Stata is much better than SPSS, but unfortunately not on this issue.

                      Comment


                      • #12
                        Kees: To add one more thing to the discussion… Mata again can be useful working simultaneously with multiple "data sets" when each of the data sets can be defined as a Mata matrix (not always possible, of course, e.g. string and numeric variables in a dataset).

                        I find particularly helpful in such cases the use of Mata's -asarray- which serves as a container for multiple matrixes that can be indexed in a variety of ways, and can be saved for future use. This may or may not work for your particular analysis, but I've found -asarray- to be a useful programming tool with lots of applicability in my own work.

                        Comment


                        • #13
                          Clyde: `: rowsof D' gives an error, so I used `nrows' by using local nrows = rowsof(D).
                          Are you sure you didn't mistype something? I ran that code before I posted it, and there were no errors.

                          As for not being able to keep two data sets open in Stata, that is true. However, there is one limited exception that might suit your needs here. If you are looping and generating results in each iteration, you can store those results in a second data set using the -postfile- mechanism. I don't know if that applies to your situation, but if it sounds like it does, check out -help postfile-.

                          In general, Stata matrices are not a good way to create an occult data set. They are fairly difficult to work with, as you have seen. They are really useful only when you plan to actually do matrix algebra with them. (And even then, Mata is usually better.) And, at least in my line of work, most of the situations where one would be tempted to use them as a workaround for the only-one-data-set-open limit, -postfile- is the ideal solution.

                          Comment


                          • #14
                            I started a complete new Stata session (version 14), with just your code and got this:
                            Code:
                            . forvalues i = 1/`:rowsof D' {
                              2.     replace name = `"`:word `i' of `rnames''"' in `i'
                              3. }
                            rowsof not allowed
                            invalid syntax
                            r(198);
                            Thank you for referring to postfile, I'm going to try it.

                            Comment


                            • #15
                              Kees, I just reran my same code in Stata 14.2, and there I get the same error message you do. But it does work in 15.

                              Comment

                              Working...
                              X