Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Convert MATA matrix into STATA variables

    I am trying to run a linear probability model. In the model, a dependent variable for each unique observation takes values either 0 or 1. My goal is to estimate probabilities of lying within each unique value of the dependent variable after adjusting for independent variables. My codes are as follows:

    Code:
    forvalues i=1/100 {
    use "E:\data\sample`i'.dta",clear
    
    egen group = group($y)             /////////// y is a dependent variable ///////////
    
    tempname max
    sum group
    scalar `max' = r(max)
    forvalues a = 1(1)`=`max'' {
    gen y_`a' = 0
    replace y_`a' = 1 if group <= `a'
    
    global group "y_*"
    
    mata: y = st_data(., "$group")
    mata: X = st_data(., "$xs")      /////////// xs is a vector for independent variables ///////////
    mata: X = X, J(rows(X),1,1)
    mata: b = invsym(X'*X)*X'*y
    mata: yhat = X*b
    
    drop $group
    
    /// (continued) ///
    }
    Since I am using very large data sets, I try to use MATA and then convert the matrix from MATA into STATA variables. To implement such conversion, the below codes work, but it takes very long time. Thus, I would like to know a more efficient way to convert the matrix from MATA into STATA variables. I have tried to use st_addvar and st_store, but they did not work.

    Code:
    {
    /// (continued) ///
    
    tempname max
    su group
    scalar `max' = r(max)
    forvalues a = 1(1)`=`max'' {
    mata: p_`a' = yhat[., `a']
    getmata p_`a', force
    }
    Thank you in advance!

  • #2
    Two things:
    • I don't understand what you ultimately want to achieve; knowing what your real problem is would be useful for a better answer (as it is, your code seems very convoluted, with multiple loops over samples, etc. which seems weird because -regress- will almost always be faster than the mata approach, and that -y- should only take one variable as you said)
    • For your more inmediate question, check help mf_st_store . The st_store() command in mata might be a bit faster than your getmata approach, although probably not by much.

    Comment


    • #3
      A couple of comments:
      • When speeding up an algorithm you need to take into account the time spent writing the algorithm. You have lost already quite a lot of time writing what you have done. Is the speed-up really going to make up for that? Almost always the answer is no.
      • You can cut a bit of overhead by using _regress instead of regress.
      • \( \mathbf{(X'X)^{-1}X'Y} \) is the textbook formula for computing the coefficients, but it is not what modern computer programs do, as it is not the most stable way of doing that computation.
      ---------------------------------
      Maarten L. Buis
      University of Konstanz
      Department of history and sociology
      box 40
      78457 Konstanz
      Germany
      http://www.maartenbuis.nl
      ---------------------------------

      Comment


      • #4
        I will ad to Maarten's comments

        1. cross-products such as x'x are more efficiently computed with cross() or quadcross(). Transposing is actually a costly operation.
        2. it's better numerically speaking to use a solver such as qrsolve() to compute the coefficient beta instead of the textbook formula. Generally ols is solved by using the qr decomposition.

        Comment


        • #5
          Thank you for your suggestions and comments!

          For my analysis, I have to necessarily use loop at least once, as shown in my code. Thus, incorporating multiple loops into a single loop and running separate regress within the loop may be more efficient than using mata with multiple loops.

          Comment

          Working...
          X