Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Convert Mata code to Stata Program

    Thanks to Robert Picard, the following code estimates regression. How can I use it in ado program.
    Code:
    * define a linear regression with intercept in Mata
    mata:
    mata clear
    mata set matastrict on
    real rowvector myreg(real matrix Xall)
    {
        real colvector y, b, Xy
        real matrix X, XX, R
        real scalar ymean, tss, mss, r2
    
        y = Xall[.,1]                // dependent var is first column of Xall
        X = Xall[.,2::cols(Xall)]    // the remaining cols are the independent variables
        X = X,J(rows(X),1,1)        // add a constant
        
        XX = quadcross(X, X)        // linear regression, see help mata cross(), example 2
        Xy = quadcross(X, y)
        b  = invsym(XX) * Xy
        
        ymean = mean(y)
        tss = sum((y :- ymean) :^ 2)        // total sum of squares
        mss = sum( (X * b :- ymean)  :^ 2)  // model sum of squares    
        r2 = mss / tss
    
        return(rows(X), r2, b')
    }
    end

  • #2
    Why not just use regress in your Stata code?

    Code:
     
    help regress

    Comment


    • #3
      I need to use this code with rangestat function, but there are lots of replications for which i need ado program, this is the full code, which needs automation
      Code:
      * define a linear regression with intercept in Mata
      mata:
      mata clear
      mata set matastrict on
      real rowvector myreg(real matrix Xall)
      {
          real colvector y, b, Xy
          real matrix X, XX, R
          real scalar ymean, tss, mss, r2
      
          y = Xall[.,1]                // dependent var is first column of Xall
          X = Xall[.,2::cols(Xall)]    // the remaining cols are the independent variables
          X = X,J(rows(X),1,1)        // add a constant
          
          XX = quadcross(X, X)        // linear regression, see help mata cross(), example 2
          Xy = quadcross(X, y)
          b  = invsym(XX) * Xy
          
          ymean = mean(y)
          tss = sum((y :- ymean) :^ 2)        // total sum of squares
          mss = sum( (X * b :- ymean)  :^ 2)  // model sum of squares    
          r2 = mss / tss
      
          return(rows(X), r2, b')
      }
      end
      
      
      * ---- first step, cross-sectional regression, only do one per time period ---
      bysort ms (id): gen ms1 = _n == 1
      gen low = cond(ms1, ms, c(minfloat))
      gen high = low
      rangestat (myreg) returnmonthlystock logsize1 logbm1, interval(ms low high) casewise
      * -myreg- returns, in order, N r2 coefficients, the intercept is last
      rename (myreg1 myreg2 myreg3 myreg4 myreg5) (nobs r2 b_size b_bm b_constant)
      keep if ms1
      
      * ---- second step, estimate final coefficients ---
      gen one = 1
      mvreg b_mvalue b_kstock b_constant if year<1991= one, noconstant
      
      qui sum r2, meanonly
      dis "avg. R-squared = " r(mean)

      Comment


      • #4
        I am familiar with the rangestat command (not function), which is from SSC, as you are asked to explain. If you want to automate that, the Mata code doesn't need translation, so long as Stata can see it.

        "needs automation" is not a definition of a program. See [U] 18 for an introduction to Stata programming.

        Comment


        • #5
          Actually, I tried to make an ado program, see the following code, but it returns the error,

          'real' found where name expected
          (20 lines skipped)
          (error occurred while loading rolff.ado)
          r(3000);


          Code:
          cap drop rolff
          pro rolff
          
          syntax varlist [if] [in], [Saving(string)] 
          
          *use Data_for_Fama_French_2010_Bootstrapping, clear
          tokenize `varlist'
          local first `1'
          macro shift
          local rest `*'
          
          loc save: word count `saving'
          
          
          preserve
          tempvar low high time1
          qui tsset
          loc timevar =r(timevar) 
          loc panelvar = r(panelvar)
          
          * ---- first step, cross-sectional regression, only do one per time period ---
          bysort `timevar' (`panelvar'): gen `time1' = _n == 1
          gen `low' = cond(`time1', `timevar', c(minfloat))
          gen `high' = `low'
          rangestat (myreg) `first' `rest', interval(`timevar' `low' `high') casewise
          * -myreg- returns, in order, N r2 coefficients, the intercept is last
          *rename (myreg1 myreg2 myreg3 myreg4 myreg5) (nobs r2 b_size b_bm b_constant)
          keep if time1
          
          * ---- second step, estimate final coefficients ---
          gen one = 1
          mvreg myreg4 myreg5 b_constant = one `if' , noconstant
          
          qui sum myreg2, meanonly
          dis "avg. R-squared = " r(mean)
          
          if `save'>0 {
          save "`saving'", replace
          }
          restore
          end
          
          mata:
          mata clear
          mata set matastrict on
          void real rowvector myreg(real matrix Xall)
          {
              real colvector y, b, Xy
              real matrix X, XX, R
              real scalar ymean, tss, mss, r2
          
              y = Xall[.,1]                // dependent var is first column of Xall
              X = Xall[.,2::cols(Xall)]    // the remaining cols are the independent variables
              X = X,J(rows(X),1,1)        // add a constant
              
              XX = quadcross(X, X)        // linear regression, see help mata cross(), example 2
              Xy = quadcross(X, y)
              b  = invsym(XX) * Xy
              
              ymean = mean(y)
              tss = sum((y :- ymean) :^ 2)        // total sum of squares
              mss = sum( (X * b :- ymean)  :^ 2)  // model sum of squares    
              r2 = mss / tss
          
              return(rows(X), r2, b')
          }
          end

          Comment


          • #6
            A function cannot return void and real at the same time. Change

            Code:
            void real rowvector myreg(real matrix Xall)
            to

            Code:
            real rowvector myreg(real matrix Xall)
            as in your initial post.

            I fail to see the point why this should be a program or an ado-file, but leave this decision up to you.

            Best
            Daniel

            Comment


            • #7
              When i change it as suggested by Daniel, the program retuns this error message
              Code:
              webuse grunfeld
              rolff invest mvalue
              statistic (myreg): Mata function not found
              r(198);

              Comment


              • #8
                Saeed Sardar --

                I am running Stata IC 14, and in this version, I tried running your code but found that there were a few coding mistakes here and there - which could be due to syntax differences between earlier versions of Stata. I tried running your code with the suggestions made by daniel klein and found a few things that were stopping it, like:
                1. I kept getting a "variable myreg5 not found" error. My code is altered so that myreg5 becomes myreg4, etc.
                2. I also got a syntax error when trying to run mvreg, which I think should have an = sign after the first variable. I changed the code to this.
                3. I also changed the variable one to be a local, as if I tried to run the program more than once, it would stop because one was already generated previously.
                4. I couldn't get cap drop rolff to work, so I changed it to cap pro drop rolff
                Anyways, here is my code - whether or not it does what you need, at least it runs to the end!

                Code:
                cap pro drop rolff
                pro rolff
                
                syntax varlist [if] [in], [Saving(string)] 
                
                *use Data_for_Fama_French_2010_Bootstrapping, clear
                tokenize `varlist'
                local first `1'
                macro shift
                local rest `*'
                
                loc save: word count `saving'
                
                
                preserve
                tempvar low high time1 one
                qui tsset
                loc timevar =r(timevar) 
                loc panelvar = r(panelvar)
                
                * ---- first step, cross-sectional regression, only do one per time period ---
                bysort `timevar' (`panelvar'): gen `time1' = _n == 1
                gen `low' = cond(`time1', `timevar', c(minfloat))
                gen `high' = `low'
                rangestat (myreg) `first' `rest', interval(`timevar' `low' `high') casewise
                * -myreg- returns, in order, N r2 coefficients, the intercept is last
                *rename (myreg1 myreg2 myreg3 myreg4 myreg5) (nobs r2 b_size b_bm b_constant)
                keep if `time1'
                
                * ---- second step, estimate final coefficients ---
                gen `one' = 1
                
                mvreg myreg3 = myreg4 `one', noconstant 
                
                qui sum myreg2, meanonly
                dis "avg. R-squared = " r(mean)
                
                if `save'>0 {
                save "`saving'", replace
                }
                restore
                end
                
                mata:
                mata clear
                mata set matastrict on
                real rowvector myreg(real matrix Xall)
                {
                    real colvector y, b, Xy
                    real matrix X, XX, R
                    real scalar ymean, tss, mss, r2
                
                    y = Xall[.,1]                // dependent var is first column of Xall
                    X = Xall[.,2::cols(Xall)]    // the remaining cols are the independent variables
                    X = X,J(rows(X),1,1)        // add a constant
                    
                    XX = quadcross(X, X)        // linear regression, see help mata cross(), example 2
                    Xy = quadcross(X, y)
                    b  = invsym(XX) * Xy
                    
                    ymean = mean(y)
                    tss = sum((y :- ymean) :^ 2)        // total sum of squares
                    mss = sum( (X * b :- ymean)  :^ 2)  // model sum of squares    
                    r2 = mss / tss
                
                    return(rows(X), r2, b')
                }
                end
                
                webuse grunfeld
                rolff invest mvalue
                Hope that helps!

                Comment


                • #9
                  To those interested, this appears to be an extension of this post of mine.

                  I think that Daniel and Matthew picked-up most of the issues. Here's a good start to make the program code more general:

                  Code:
                  mata:
                  mata clear
                  mata set matastrict on
                  real rowvector myreg(real matrix Xall)
                  {
                      real colvector y, b, Xy
                      real matrix X, XX, R
                      real scalar ymean, tss, mss, r2
                  
                      y = Xall[.,1]                // dependent var is first column of Xall
                      X = Xall[.,2::cols(Xall)]    // the remaining cols are the independent variables
                      X = X,J(rows(X),1,1)        // add a constant
                      
                      XX = quadcross(X, X)        // linear regression, see help mata cross(), example 2
                      Xy = quadcross(X, y)
                      b  = invsym(XX) * Xy
                      
                      ymean = mean(y)
                      tss = sum((y :- ymean) :^ 2)        // total sum of squares
                      mss = sum( (X * b :- ymean)  :^ 2)  // model sum of squares    
                      r2 = mss / tss
                  
                      return(rows(X), r2, b')
                  }
                  end
                  
                  cap program drop rolff
                  program rolff
                  
                      syntax varlist [if] [in], [Saving(string)] 
                  
                      gettoken depvar indepvars : varlist
                  
                      preserve
                      
                      qui tsset
                      loc timevar =r(timevar) 
                      loc panelvar = r(panelvar)
                  
                      * ---- first step, cross-sectional regression, only do one per time period ---
                      tempvar low high time1 one
                      bysort `timevar' (`panelvar'): gen `time1' = _n == 1
                      gen `low' = cond(`time1', `timevar', c(minfloat))
                      gen `high' = `low'
                      rangestat (myreg) `depvar' `indepvars', interval(`timevar' `low' `high') casewise
                      keep if `time1'
                      unab myreg_vars : myreg*
                      gettoken myreg_obs more : myreg_vars
                      gettoken myreg_r2 indepvars : more
                  
                      * ---- second step, estimate final coefficients ---
                      gen `one' = 1
                      mvreg `indepvars' = `one' `if' , noconstant
                  
                      qui sum `myreg_r2', meanonly
                      dis "avg. R-squared = " r(mean)
                  
                      if "`saving'" != "" save "`saving'", replace
                      restore
                      
                  end
                  
                  webuse grunfeld, clear
                  
                  rolff invest mvalue kstock
                  xtfmb invest mvalue kstock

                  Comment


                  • #10
                    Thanks for your replies, I am still stuck on the same issue. From DO file editor, the version supplied by Matthew and Robert works, but when I save the code as ado file and place it in my personal director, the program, when called, returns the following error. What am I missing?
                    Code:
                    . rolff invest mvalue
                    rolff.myreg() already exists
                    (20 lines skipped)
                    (error occurred while loading rolff.ado)
                    r(3000);

                    Comment


                    • #11
                      If you want to place the program in its own ado file, save the following in "rolff.ado":

                      Code:
                      *! version 1.0.0  27jun2016
                      program rolff
                      
                          version 14
                      
                          syntax varlist [if] [in], [Saving(string)] 
                      
                          gettoken depvar indepvars : varlist
                      
                          preserve
                          
                          qui tsset
                          loc timevar =r(timevar) 
                          loc panelvar = r(panelvar)
                      
                          * ---- first step, cross-sectional regression, only do one per time period ---
                          tempvar low high time1 one
                          bysort `timevar' (`panelvar'): gen `time1' = _n == 1
                          gen `low' = cond(`time1', `timevar', c(minfloat))
                          gen `high' = `low'
                          rangestat (myreg) `depvar' `indepvars', interval(`timevar' `low' `high') casewise
                          keep if `time1'
                          unab myreg_vars : myreg*
                          gettoken myreg_obs more : myreg_vars
                          gettoken myreg_r2 indepvars : more
                      
                          * ---- second step, estimate final coefficients ---
                          gen `one' = 1
                          mvreg `indepvars' = `one' `if' , noconstant
                      
                          qui sum `myreg_r2', meanonly
                          dis "avg. R-squared = " r(mean)
                      
                          if "`saving'" != "" save "`saving'", replace
                          restore
                      
                      end
                      Then, in your do file, you can use something like:

                      Code:
                      clear all
                      
                      mata:
                      real rowvector myreg(real matrix Xall)
                      {
                          real colvector y, b, Xy
                          real matrix X, XX, R
                          real scalar ymean, tss, mss, r2
                      
                          y = Xall[.,1]                // dependent var is first column of Xall
                          X = Xall[.,2::cols(Xall)]    // the remaining cols are the independent variables
                          X = X,J(rows(X),1,1)        // add a constant
                          
                          XX = quadcross(X, X)        // linear regression, see help mata cross(), example 2
                          Xy = quadcross(X, y)
                          b  = invsym(XX) * Xy
                          
                          ymean = mean(y)
                          tss = sum((y :- ymean) :^ 2)        // total sum of squares
                          mss = sum( (X * b :- ymean)  :^ 2)  // model sum of squares    
                          r2 = mss / tss
                      
                          return(rows(X), r2, b')
                      }
                      end
                      
                      webuse grunfeld
                      rolff invest mvalue kstock
                      xtfmb invest mvalue kstock
                      
                      rolff invest mvalue
                      xtfmb invest mvalue
                      The Mata function cannot be placed in the ado file because it would be private to that ado and the function needs to be passed to rangestat.

                      Comment

                      Working...
                      X