Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Matrix subscripting based on matrix colnames

    Dear All,
    I have quick question. I have a matrix X and I want to extract, say, 2 columns based in the column names associated to X. In Stata, this would be achieved using the command matselrc
    Code:
    sysuse auto, clear
    mkmat price rep78 weight length foreign trunk, matrix(X) /*this sets X as a 74x6 matrix, with varnames as column names*/
    matselrc X X_submatrix, c(`=colnumb(matrix(X), "weight")' `=colnumb(matrix(X), "foreign")') /*this extracts X's columns associated with colnames "weight" (column 3) and "foreign" (column 5) */
    matrix list X_submatrix
    this results in X_submatrix, a 74x2 matrix, containing the 2 columns associated to variables weight and foreign, as intended. However, in reality I have a way much larger dataset and so I want to do the task in Mata rather than setting a large matsize number. However, the Mata documentation I have seen shows that this extraction can be done only by indicating numbers corresponding to the variables' relative position. Hence, the following would do the job

    Code:
    mata:
    mata_X     =st_matrix("X")[, (3, 5)]
    mata_X
    end
    However, I want to stick to specifying variable names instead of their relative position because this latter might change in every application and so updating this time and time again will be error-prone. So, my question is: is there any way to subscript a matrix in Mata based on the column names (instead of column numbers) of the Stata Matrix? I was thinking that something along the lines of
    mata_X=st_matrix("X")[, ("weight", "foreign")]
    would do the work but this is wrong. I also tried a way to get st_matrixcolstripe involved in this task, but to no avail.

    Thank you so much
    JM
    Last edited by Juan del Pozo; 25 Sep 2019, 05:30.

  • #2
    Hi
    You do not need to save data in a matrix first. You can extract variables directly:
    Code:
    . sysuse auto
    (1978 Automobile Data)
    . mata
    : X = st_data(., "foreign weight")
    : rows(X), cols(X)
            1    2
        +-----------+
      1 |  74    2  |
        +-----------+
    : end
    Or if you have matrixtools installed (ssc install) you can use nhb_mt_labelmatrix and the method regex_select.
    Note that you have to read in values (here only the first 15) and value names.
    Code:
    . mata:
    : lm = nhb_mt_labelmatrix()
    : lm.values(st_data(1::15,.))
    : lm.column_names(st_varname(1..st_nvar())')
    : lm.print()
    --------------------------------------------------------------------------------------------------------
    make     price    mpg  rep78  headroom  trunk   weight  length   turn  displacement  gear_ratio  foreign
    --------------------------------------------------------------------------------------------------------
           4099.00  22.00   3.00      2.50  11.00  2930.00  186.00  40.00        121.00        3.58     0.00
           4749.00  17.00   3.00      3.00  11.00  3350.00  173.00  40.00        258.00        2.53     0.00
           3799.00  22.00             3.00  12.00  2640.00  168.00  35.00        121.00        3.08     0.00
           4816.00  20.00   3.00      4.50  16.00  3250.00  196.00  40.00        196.00        2.93     0.00
           7827.00  15.00   4.00      4.00  20.00  4080.00  222.00  43.00        350.00        2.41     0.00
           5788.00  18.00   3.00      4.00  21.00  3670.00  218.00  43.00        231.00        2.73     0.00
           4453.00  26.00             3.00  10.00  2230.00  170.00  34.00        304.00        2.87     0.00
           5189.00  20.00   3.00      2.00  16.00  3280.00  200.00  42.00        196.00        2.93     0.00
          10372.00  16.00   3.00      3.50  17.00  3880.00  207.00  43.00        231.00        2.93     0.00
           4082.00  19.00   3.00      3.50  13.00  3400.00  200.00  42.00        231.00        3.08     0.00
          11385.00  14.00   3.00      4.00  20.00  4330.00  221.00  44.00        425.00        2.28     0.00
          14500.00  14.00   2.00      3.50  16.00  3900.00  204.00  43.00        350.00        2.19     0.00
          15906.00  21.00   3.00      3.00  13.00  4290.00  204.00  45.00        350.00        2.24     0.00
           3299.00  29.00   3.00      2.50   9.00  2110.00  163.00  34.00        231.00        2.93     0.00
           5705.00  16.00   4.00      4.00  20.00  3690.00  212.00  43.00        250.00        2.56     0.00
    --------------------------------------------------------------------------------------------------------
    
    : lm.regex_select("foreign|weight", keep=1, names=1, row=0).print()
    ----------------
     weight  foreign
    ----------------
    2930.00     0.00
    3350.00     0.00
    2640.00     0.00
    3250.00     0.00
    4080.00     0.00
    3670.00     0.00
    2230.00     0.00
    3280.00     0.00
    3880.00     0.00
    3400.00     0.00
    4330.00     0.00
    3900.00     0.00
    4290.00     0.00
    2110.00     0.00
    3690.00     0.00
    ----------------
    Kind regards

    nhb

    Comment


    • #3
      Dear Niels
      Thank you for your prompt answer. Your code does exactly what I wanted! Just for completeness, the matrix I have was not created from the Stata dataset via mkmat (that was just a MWE to make my question easier to understand). Instead, the matrix I have was actually created in a previous step in my analysis after doing some data imputations and linear projections in Mata; this matrix (named "distribut_male_sim_1" with its associated column names) was then passed to Stata. It is this matrix whose columns I wanted to extract in order to create a submatrix in Mata.

      Having said that, I accommodated your answer (which assumes that the data comes from the observed Stata dataset) to my purpose, so it selects 3 columns (p10, p50, p90) from the Stata matrix distribut_male_sim_1
      Code:
      mata:
      lm = nhb_mt_labelmatrix()
      lm.values(st_matrix("distrib_male_sim_1"))
      lm.column_names(st_matrixcolstripe("distrib_male_sim_1")[,2])
      lm.regex_select("p10|p50|p90", keep=1, names=1, row=0).print()
      end
      It works. However, I have 2 questions: 1) How can I store the matrix created in the last line of the code in Mata (something like B=lm.regex_select("p10|p50|p90", keep=1, names=1, row=0)?; 2) Given the versatility of your matrixtools command, I looked in the help file to understand more what the nhb_mt_labelmatrix class is about as well as the syntax for regex_select. Unfortunately, I could not find any further information. Is there any place where I can get more info about this? In worst case scenario, is there any alternative way to do the task above with conventional commands?

      Thank you again for your time!
      JM
      Stata 14.0
      Last edited by Juan del Pozo; 25 Sep 2019, 09:43. Reason: typo

      Comment


      • #4
        Dear Juan
        It is undocumented so far. I have to write it asap.
        The following code example might help you:
        Code:
        mata:
        lm = nhb_mt_labelmatrix()
        lm.from_matrix("distrib_male_sim_1")
        lm = lm.regex_select("p10|p50|p90", keep=1, names=1, row=0)
        lm.to_matrix("new_matrix") //saves in new matrix
        lm.to_matrix("distrib_male_sim_1", overwrite=1) //to overwrite existing matrix
        end
        Only one of the last two lines of to_matrix are necessary.
        Kind regards

        nhb

        Comment


        • #5
          There are some minor floors in the code for now.
          The overwrite option for to_matrix is not functioning and row_names has to be set when empty before using to_matrix.

          The code below should demonstrate the functionality of nhb_mt_labelmatrix.
          I wrote the code, because I knew no better way of handling labelled matrices in Stata/Mata

          Code:
          cls
          sysuse auto
          mata
          X = st_data(., "foreign weight")
          rows(X), cols(X)
          end
          mata:
          lm = nhb_mt_labelmatrix()
          lm.values(st_data(1::15,.))
          lm.column_names(st_varname(1..st_nvar())')
          lm.row_names("r" :+ strofreal(1::rows(lm.values())))
          lm.print()
          lm=lm.regex_select("foreign|weight", keep=1, names=1, row=0)
          lm.to_matrix("nhb")
          end
          matprint nhb
          mata: lm.values(st_data(1::15,.))
          mata: lm.column_names(st_varname(1..st_nvar())')
          mata: lm.to_matrix("nhb", 0)
          matprint nhb
          mata: lm.to_matrix("nhb", replace=1)
          matprint nhb
          Kind regards

          nhb

          Comment

          Working...
          X