Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Changing matrix values based on values of STATA variable

    Dear Statalist,

    I am currently running a random forest classification algorithm in STATA, and I wish to construct a cost function needed to run a weighted RF (using the crtrees algorithm).

    In order to do this, I need to create an NxN size matrix, which starts off as an identity matrix, where the diagonal represents the relative weight of each observation in the dataset (i.e., using the identity matrix in itself means that all outcomes are equally weighed). To give you some context, my dataset consists of 19,000 data points, so it will be a 19,000x19,000 matrix.

    I have figured out how to create an identity matrix equal to the number of observations in the dataset using Mata. To provide a concrete example, take the following:

    sysuse auto
    gen N=_N
    global N=N
    mata: st_matrix("MyMatrix", I($N)

    Here is where I am currently stumped, however.

    To go from the identity matrix to the matrix that I aim to build (i.e., the cost function), I need to change specific elements of the matrix based on values of a variable in the dataset (dummy variable).

    Using the "
    Automobile" dataset again, this is comparable to changing values of elements in "MyMatrix" for all observations where foreign==1.
    For the sake of illustration, assume that I would, for example, wish to attribute the value "2" to all of those observations.

    My question is now: Is this possible to perform this operation in this manner, or have I misunderstood how Mata works?


    Sincerely
    Johan Karlsson



  • #2
    What you are doing is not a good idea with N = 19000. The statement
    Code:
    mata: st_matrix("MyMatrix", I(19000))
    would create a *Stata* matrix (not a Mata matrix) that is much larger than the matrix size limits for Stata matrices. See -help matsize-. There are some oddities of Stata matrices created in Mata that might let you do this otherwise "illegal" thing, but it's at best going to create a Stata matrix that I would never trust. You can create a matrix such as you want as a *Mata* matrix. Here's a simple (but likely not optimally time-efficient) way to do that.
    Code:
    sysuse auto
    local N = _N
    mata:
    I = I(`N') 
    F = st_data(., "foreign")    // a Mata version of this variable
    // Loop over observations and change values of I according to what's in foreign.
    for ( i = 1; i <= `N'; i++ ) { 
        if (F[i,1] == 1) {
           I[i,i] = 2
        }
    }
    // You could now create a Stata version of this matrix, but per above, it will create a
    // grossly oversized Stata matrix, whose behavior I would not trust.
    st_matrix("MyMatrix", I)
    end
    (I tried the preceding with _N = 19000 and it did work on my version 15.1 Stata, by the way, but I still would not trust MyMatrix.)

    Some other comments on your code:
    1) Don't use globals. They are dangerous and unnecessary.
    2) -gen N = _N- Creates a whole Stata variable of which every value contains N. I can't understand why you think that would be necessary.
    3) -global N = N- does not do what you likely think it does. Assigning a Stata variable to a global or local is meaningless from Stata's perspective, since a variable has many values and a global or local has just one. Stata adapts to this impossible situation by quietly assigning the value of the *first* observation of the variable. This is almost never what you want (except in certain trick situations.) It will work here, but it's not a safe practice.



    Comment


    • #3
      Originally posted by Mike Lacy View Post
      What you are doing is not a good idea with N = 19000. The statement
      Code:
      mata: st_matrix("MyMatrix", I(19000))
      would create a *Stata* matrix (not a Mata matrix) that is much larger than the matrix size limits for Stata matrices. See -help matsize-. There are some oddities of Stata matrices created in Mata that might let you do this otherwise "illegal" thing, but it's at best going to create a Stata matrix that I would never trust. You can create a matrix such as you want as a *Mata* matrix. Here's a simple (but likely not optimally time-efficient) way to do that.
      Code:
      sysuse auto
      local N = _N
      mata:
      I = I(`N')
      F = st_data(., "foreign") // a Mata version of this variable
      // Loop over observations and change values of I according to what's in foreign.
      for ( i = 1; i <= `N'; i++ ) {
      if (F[i,1] == 1) {
      I[i,i] = 2
      }
      }
      // You could now create a Stata version of this matrix, but per above, it will create a
      // grossly oversized Stata matrix, whose behavior I would not trust.
      st_matrix("MyMatrix", I)
      end
      (I tried the preceding with _N = 19000 and it did work on my version 15.1 Stata, by the way, but I still would not trust MyMatrix.)

      Some other comments on your code:
      1) Don't use globals. They are dangerous and unnecessary.
      2) -gen N = _N- Creates a whole Stata variable of which every value contains N. I can't understand why you think that would be necessary.
      3) -global N = N- does not do what you likely think it does. Assigning a Stata variable to a global or local is meaningless from Stata's perspective, since a variable has many values and a global or local has just one. Stata adapts to this impossible situation by quietly assigning the value of the *first* observation of the variable. This is almost never what you want (except in certain trick situations.) It will work here, but it's not a safe practice.


      Thank you for your help Mike! I have tried generating a portion of the matrix (_N= approx. 7000) and your input works a treat. I have also taken your advice and skipped the generate function, I did not know that you could assign _N directly to a local macro. Can't wait to try out the analysis!

      Comment

      Working...
      X