Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Returning variables to Stata from Mata data with different levels of observation

    Hey,

    I am currently aiming to use the values of a Mata matrix in a regression in Stata. I have found earlier suggestions, however my situation is slightly different. The solution that I've added below does not work for me since the first assumption that the writer makes does not hold for me. I have a dataset in Stata with 60.000+ variables with unique identifiers "citypairairline". Furthermore I have a variable "airline" which is categorical and could take values 1 to 13, corresponding with the number of columns and rows in the Mata matrix. I now want to return the values of the Mata matrix to Stata such that 13 new variables are created in Stata, with matching columns to the 13 different variables and matching the values of "airline" 1 to 13 to the rows. (or the other way around, it is a symmetric matrix)

    Thank you in advance,

    Frank
    I'll show how, but first I am going to make some assumptions:

    1. Thomas didn't say whether the number of observations in the Stata dataset are the same as the number of rows in Results, but what he wrote implied it.

    2. Thomas didn't say whether the results in the Stata dataset and the results in Results are in the same order, but I will assume that they are.

    Solution 1: Stata/Mata approach -------------------------------
    At the outset, in Stata, create the three new variables to hold the results:
    . gen Result1 = .
    . gen Result2 = .
    . gen Result3 = .

    Now, back in Mata, after calculations of mata matrix Results, code, :
    st_store(., ("Result1", "Result2", "Result3"), Results[|1,3 \ .,.|] )

    Explanation: Matrix Results is N x 5, but the first two variables (columns) in the matrix already appear in the data, simply want to add 3, 4, and 5 to the Stata dataset. There are lots of other ways I could have coded st_store().

    Among them: : st_store(., "Result1", Results[3,.]) : st_store(., "Result2", Results[4,.]) : st_store(., "Result3", Results[5,.]) or : st_store(., ("Result1", "Result2", "Result3"), Results[(., (3,4,5)] )

    The way I chose, storing Results[|1,3 \ .,.|], says to pull the submatrix of Results from top-left 1,3 to bottom right number of rows and number of columns.

  • #2
    Have you considered transferring the 13x13 Mata results to a (temporary) Stata dataset and then merging it with your 60,000-row Stata dataset?

    Comment


    • #3
      I will try that now, I however have to repeat this process for several matrices and then for several datasets. So it would be great if there would be another way. (Since this fills up my hard disk fast)

      Thanks for the help!

      Comment


      • #4
        Assuming that you have at least version 13, you can use st_selectindex. It will help you in getting the row-indices corresponding to a specific value of your identifier-variable.
        I've tried to make a version which works in case you want to work on a subset of your dataset.

        Code:
        clear
        set obs 13
        
        gen id = _n
        
        expand 100
        gen touse = runiform()>.1
        
        
        mata:
        
        rseed(10101)
        R = colshape(1::13^2,13)
        
        st_view(id=.,.,"id","touse")
        D = J(rows(id),13,.)
        
        vnames = "res":+strofreal(1..13)
        (void) st_addvar("double",vnames)
        
        for (i = 1; i <= 13 ; i++)
        {
        
                index = selectindex(id:==i)
                D[index,] = J(rows(index),1,R[i,])
        
        }
        
        st_store(.,vnames,"touse",D)
        
        
        end

        Comment

        Working...
        X