Mkmat for groups in a dataset for use with other matrix operations

ebony bridwell-mitchell

Join Date: Jun 2014

Posts: 19
#1

Mkmat for groups in a dataset for use with other matrix operations

16 Feb 2019, 19:32

Hi Statalist: I do not have much experience writing code for Stata except for the routines with which I am most familiar. So, I am not certain how to write code to achieve the following:
Ultimately, I need to append or 'vertically stack the rows' of a set of saved matrices resulting from a series of matrix operations.

This requires first having saved the results from the series matrix operations, namely A'*A (i.e. A-transpose by A), for matrices A-Z.

Before this, however, I need to create matrices A-Z from an existing Stata dataset, which has 8,981 observations nested into 1,136 groups identified by 'groupID' and where groups are of different sizes, n. Below is an example, noting that for this example, I'd want to start by making three matrices of size 5x10, 7x10 and 3x10:

groupID v1 v2 v3 v4 v5 v6 v7 v8 v9 v10
1714001 1 0 1 1 0 0 0 0 0 0
1714001 0 0 1 0 0 0 0 0 0 0
1714001 1 1 1 1 1 0 0 0 0 1
1714001 0 1 1 0 0 0 1 0 0 0
1714001 1 1 1 1 1 0 0 0 0 1
1714071 1 1 1 1 1 1 1 1 1 1
1714071 1 1 1 1 1 1 1 0 0 0
1714071 0 0 1 0 0 1 1 0 0 0
1714071 1 1 1 1 0 0 1 0 1 1
1714071 1 0 1 1 1 1 1 0 0 1
1714071 1 1 1 1 1 0 1 0 0 1
1714071 1 1 1 1 1 0 0 0 0 0
1714081 0 0 0 1 0 0 1 0 0 0
1714081 1 1 1 1 1 1 1 0 1 0
1714081 0 0 0 0 0 0 0 0 0 0

Given the above, it seems I would first somehow need to use mkmat looping through the values of groupID. Then, I'd need need to loop through A'*A for each stored matrix, A-Z. After this, I could vertically append the saved results for each operation with mat new = A/B/C/..../Z, but ideally without having to write out all 1,136 elements. I'd appreciate any help on how to get started. The leads I've found from previous posts, such as subsets using complicated criterion or the help manual haven't provided quite the help I need.

Thanks - Ebony
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30122
#2

16 Feb 2019, 19:57

I'm not entirely sure I understand what you want. Here's what I think you mean:

1. For each value of groupID, create a matrix out of variables v1 through v10 for all observations with that value of groupID.
2. Next compute the product of the transpose of that matrix with itself.
3. Next stack that product vertically under the stacked products of all the preceding groups.

If that's what you want:

Code:

clear* * Example generated by -dataex-. To install: ssc install dataex clear input long groupid byte(v1 v2 v3 v4 v5 v6 v7 v8 v9 v10) 1714001 1 0 1 1 0 0 0 0 0 0 1714001 0 0 1 0 0 0 0 0 0 0 1714001 1 1 1 1 1 0 0 0 0 1 1714001 0 1 1 0 0 0 1 0 0 0 1714001 1 1 1 1 1 0 0 0 0 1 1714071 1 1 1 1 1 1 1 1 1 1 1714071 1 1 1 1 1 1 1 0 0 0 1714071 0 0 1 0 0 1 1 0 0 0 1714071 1 1 1 1 0 0 1 0 1 1 1714071 1 0 1 1 1 1 1 0 0 1 1714071 1 1 1 1 1 0 1 0 0 1 1714071 1 1 1 1 1 0 0 0 0 0 1714081 0 0 0 1 0 0 1 0 0 0 1714081 1 1 1 1 1 1 1 0 1 0 1714081 0 0 0 0 0 0 0 0 0 0 end capture program drop one_group program define one_group local group = groupid[1] mkmat v*, matrix(M`group') matrix MM`group' = M`group''*M`group' matrix Stack = nullmat(Stack) \ MM`group' exit end runby one_group, by(groupid)

will do it. Now, I don't name the matrices A B C, etc. because that is inconvenient. Instead the matrices extracted from the data are named M1714081, M1714071, M1714001, etc. The products of those matrices with their transposes are named MM1714081, MM1714071, MM1714001. And the stacked version of the latter is named Stack.

To use this code you must install runby.ado, written by Robert Picard and me, available from SSC.

In the future, when showing data examples, please use the -dataex- command to do so. If you are running version 15.1 or a fully updated version 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

When asking for help with code, always show example data. When showing example data, always use -dataex-.
Comment
ebony bridwell-mitchell

Join Date: Jun 2014

Posts: 19
#3

18 Feb 2019, 15:22

Thanks, Clyde, for your help. You have summarized my aims correctly and runby.ado seems like *exactly* what I'd need. I installed the package and ran the code provided in your response. The code seems to have run with no problem since the results window reported five descriptive statistics (i.e. "Number of by-groups = 3; by-groups with errors = 0; by-groups with no data = 0; Observations processed = 15; Observations saved = 15). BUT I do have two key questions.
The help file for runby.ado indicates that "all stored results are combined and replace the data in memory." This means, I believe, that the data in use (i.e. what is viewed in the data editor) should change from the original 15x10 matrix to a 30x10 matrix, since the latter would be the result from stacking the results of the M'M transformations for the three initial 5x10, 7x10 and 3x10 groupid matrices. However, at the end of the routine the original data was unchanged in the data editor. When I try to return the results from memory using "return matrix Stacked = Stack, copy" I get the following error message, "non r-class program may not set r()". Perhaps I am missing something but I would appreciate any clarification you could provide about where/how the results are stored since I will need to export the results for later use. So, ideally, the results would be stored as a .dta file that I could then export.

Because I could not view the results, I wasn't able to verify whether the final Stack matrix included a first column/vector of the groupid associated with each submatrix. In other words the first 10 rows would be labeled with MM1714001, the second 10 with MM1714071 and the final 10 with MM1714081. If not, is it possible to add something like this to the code?

Again, thank you for your help.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30122
#4

18 Feb 2019, 20:12

So I think we are using terminology differently. From what you describe in #3, I now infer that you are using the term "matrix" to refer to the data in active memory in Stata. That is not how I am using the term. The way I wrote the code, the data in active memory will be unchanged from what you started with. Rather, the results you are looking for have been saved in Stata matrices, which are separate data structures. To view the results you can run commands like:

Code:

matrix list MM1714001 matrix list MM1714071

etc.

For more information about Stata matrices, read -help matrix- and the various sub-help-files linked therein.

You cannot -return- anything from a program unless it is declared rclass. But -runby- does not allow rclass programs. So -runby- either puts its results into the data in active memory, or in ancillary structures like matrices (in our case) or scalars.
Comment
ebony bridwell-mitchell

Join Date: Jun 2014

Posts: 19
#5

19 Feb 2019, 05:33

Clyde, thank you. Your inference is correct.

The command 'matrix list Stack' returns the final matrix. And 'svmat byte Stack' saves the matrix to the data in active memory, as I needed, noting that the data in active memory does not, however, have any rownames. This suggests, what is needed is to have changed the row names of Stack before using the 'svmat' command. It seems like this could be easily accomplished with the command 'matrix rownames Stack = [names]'. However, what I'd want for [names] is to change the existing row names in groups of 10, so the new row names are the names of each submatrix in Stack, which is to say values corresponding to groupid. In other words, whereas the original 30 row names in Stack are v1-10, v1-v10, v1-v10, I'd want the names to be 'MM1714001' repeated for the first 10 rows, MM1714071 repeated for the next 10 rows and MM1714081, repeated for the final 10 rows. This seems like it would require another looping command to be used with 'matrix rownames Stack' but again, I am not sure how to code this.

Thanks for your continued help.
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30122

19 Feb 2019, 09:31

I think the simplest way to do this is to change the rownames in each of the MM matrices right after they are created.

Code:

clear*
* Example generated by -dataex-. To install: ssc install dataex
clear
input long groupid byte(v1 v2 v3 v4 v5 v6 v7 v8 v9 v10)
1714001 1 0 1 1 0 0 0 0 0 0
1714001 0 0 1 0 0 0 0 0 0 0
1714001 1 1 1 1 1 0 0 0 0 1
1714001 0 1 1 0 0 0 1 0 0 0
1714001 1 1 1 1 1 0 0 0 0 1
1714071 1 1 1 1 1 1 1 1 1 1
1714071 1 1 1 1 1 1 1 0 0 0
1714071 0 0 1 0 0 1 1 0 0 0
1714071 1 1 1 1 0 0 1 0 1 1
1714071 1 0 1 1 1 1 1 0 0 1
1714071 1 1 1 1 1 0 1 0 0 1
1714071 1 1 1 1 1 0 0 0 0 0
1714081 0 0 0 1 0 0 1 0 0 0
1714081 1 1 1 1 1 1 1 0 1 0
1714081 0 0 0 0 0 0 0 0 0 0
end

capture program drop one_group
program define one_group
    local group = groupid[1]
    mkmat v*, matrix(M`group')
    matrix MM`group' = M`group''*M`group'
    local nrows: rowsof M`group'
    local rownames
    forvalues i = 1/`nrows' {
        local rownames `rownames' MM`group'
    }
    matrix rownames MM`group' = `rownames'
    matrix Stack = nullmat(Stack) \ MM`group'
    exit
end

runby one_group, by(groupid)

Comment

ebony bridwell-mitchell

Join Date: Jun 2014

Posts: 19
#7

19 Feb 2019, 14:35

Clyde, thanks again. This code is so close. It performs exactly as I described up through and including creating the matrix Stack with the rows labeled by groupid. For some reason, however, the row labels are not included in the active data once the command 'svmat Stack'. is executed In other words, when Stack is viewed in the results window with the command 'matrix list Stack' the row labels appear in the first column. When the data editor is used to view Stack after executing 'svmat Stack', there are no row labels - the first column is v1. Any thoughts?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30122
#8

19 Feb 2019, 15:02

Well, as you note, the matrix is what you want. The difficulty arises because -svmat- does not preserve the rownames. So you have to work around that a little. The following code can be applied once -runby- has completed generating matrix Stack.

Code:

clear svmat Stack rename Stack# v# local groups: rownames Stack gen group = "", before(v1) forvalues i = 1/`:word count `groups'' { replace group = `"`:word `i' of `groups''"' in `i' }
Comment
ebony bridwell-mitchell

Join Date: Jun 2014

Posts: 19
#9

19 Feb 2019, 15:06

Eureka! Thank you, Clyde!
Comment

Announcement

Mkmat for groups in a dataset for use with other matrix operations

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment