Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Linear regression with Mata

    Hi,

    I have a seemingly trivial problem with a regression function in Mata. I'm using rangestat in conjunction with a Mata function (myreg) to estimate as series of cross-sectional regressions in a panel data set. The function works fine until I include a calculation for the residual, which I call ehat in the function. The lines of code I run are:

    rangestat (myreg) lead_in2nav_w ln_nav, by(date_ym) interval(date_ym . date_ym) casewise

    The error I get is:

    myreg(): 3200 conformability error
    do_flex_stats(): - function returned error
    <istmt>: - function returned error
    r(3200);

    Obviously something is incorrect, but I have checked the relevant matrices for conformability and cannot find anything to indicate that they are not conformable.

    The Mata function is a modified version of one I found in this forum. I've included a sample of my data set as well. The panel is unbalanced, but the function works when I do not try to calculate ehat.

    Any suggestions are welcome. It's probably a silly error on my part.

    MATA FUNCTION

    * Linear regression in Mata using quadcross()
    mata:
    mata clear
    real rowvector myreg(real matrix Xall)
    {
    real colvector y, b, Xy, ehat
    real matrix X, XX
    real scalar ymean, tss, mss, r2, r2a, rss

    y = Xall[.,1]
    X = Xall[.,2::cols(Xall)]
    X = X,J(rows(X),1,1)

    XX = quadcross(X, X)
    Xy = quadcross(X, y)
    b = invsym(XX) * Xy

    ymean = mean(y)
    tss = sum((y :- ymean) :^ 2)
    mss = sum( (X * b :- ymean) :^ 2)
    rss = sum( (y - X * b ) :^ 2)
    r2 = mss / tss
    r2a = 1 - (1 - r2) * (rows(X) - 1) / (rows(X) - cols(X))
    ehat = y - X * b



    return(rows(X), r2, r2a, b', tss, mss, rss, ehat)
    }
    end


    SAMPLE FROM DATA SET

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str10 fundid float date_ym double(lead_in2nav_w ln_nav)
    "FS00008KNP" 612        .         .
    "FS00008KNP" 613        .         .
    "FS00008KNP" 614        .         .
    "FS00008KNP" 615        .         .
    "FS00008KNP" 616 1.290202 -1.609438
    "FS00008KNP" 617 .3964209  1.178369
    "FS00008KO8" 618 .2737324 -.7218018
    "FS00008KO8" 619 .4026174 -.5599073
    "FS00008KO8" 620  .582642 -.3225676
    "FS00008KO8" 621 .0090997  .2766691
    "FS00008KO8" 622 .0074663  .2921792
    "FS00008KO8" 623        .  .3017958
    "FS00008KSC" 612        .         .
    "FS00008KSC" 613        .         .
    "FS00008KSC" 614        .         .
    "FS00008KSC" 615        .         .
    "FS00008KSC" 616        .         .
    "FS00008KSC" 617        .         .
    "FS00008KSC" 618 .0009254  1.686963
    "FS00008KSC" 619 .0007928  1.618535
    "FS00008KSC" 620 .0043706  1.520827
    "FS00008KSC" 621 .0107276  1.616162
    "FS00008KSC" 622 .0202638  1.616135
    "FS00008KSC" 623 .0041988  1.609716
    "FS00008L07" 616        .         .
    "FS00008L07" 617        .         .
    "FS00008L07" 618 .5208364  2.399778
    "FS00008L07" 619 .1198361  2.837028
    "FS00008L07" 620  .050663  2.744303
    "FS00008L07" 621 .0505302  2.890874
    "FS00008L07" 622 .0598396  2.915028
    "FS00008L07" 623 .0480914  2.848323
    end
    format %tm date_ym

  • #2
    Ron Alquist --

    It is possible that everything is blowing up because of a lack of conformity in your "return" statement. Mata is unlike Python in this regard in that one has to return things that conform, I think. As an example:

    Code:
    mata:
    real matrix foo(x, y) {
    return(x, y)
    }
    end
    Then, if I do,
    Code:
    foo(1, J(1, 2, 1))
    my code works.

    However, if I do:

    Code:
    foo(1, J(2, 2, 1))
    it blows up, because 1 does not conform with a 2X2 matrix. I don't know if this is the problem, but it plausibly might be!

    Best,

    Matthew J. Baker

    Comment


    • #3
      Several things are wrong here.

      Let me mention first that rangestat includes the built-in (reg) OLS function so there's no need to define a Mata function anymore (since Apr. 2017).

      Second, conceptually rangestat performs one function evaluation per observation. This allows for regressions on a rolling window where the regression sample is different for each observation. With a (reg) function, rangestat returns results stored in multiple variables, one for each regression result scalar (i.e. nobs, r2, adjr2, coefficients and standard errors). The concept of saving residuals from within a user-supplied Mata function does not make sense as these form a vector that cannot be stored for the current observation in a single variable.

      Since you want to perform cross-sectional regressions, you need to perform one regression per time period. You can do this with rangestat and it is smart enough to recognize that multiple observations are specifying identical intervals. As such, rangestat will perform only one regression per time period and copy the results to all observations with the same time period. With the results, you can then manually calculate the residuals.

      A more appropriate tool here is probably runby (from SSC) that will create subsamples per time period and run a user-defined Stata program to run the regression and calculate the residuals.

      Finally, I include code to manually loop over each time period to show a brute force method for doing the same.

      Since you are performing the task per time period, I have regenerated the data example with data sorted by date_ym.

      Code:
      clear all
      
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input str10 fundid float date_ym double(lead_in2nav_w ln_nav)
      "FS00008KNP" 612        .         .
      "FS00008KSC" 612        .         .
      "FS00008KNP" 613        .         .
      "FS00008KSC" 613        .         .
      "FS00008KNP" 614        .         .
      "FS00008KSC" 614        .         .
      "FS00008KNP" 615        .         .
      "FS00008KSC" 615        .         .
      "FS00008KNP" 616 1.290202 -1.609438
      "FS00008KSC" 616        .         .
      "FS00008L07" 616        .         .
      "FS00008KNP" 617 .3964209  1.178369
      "FS00008KSC" 617        .         .
      "FS00008L07" 617        .         .
      "FS00008KO8" 618 .2737324 -.7218018
      "FS00008KSC" 618 .0009254  1.686963
      "FS00008L07" 618 .5208364  2.399778
      "FS00008KO8" 619 .4026174 -.5599073
      "FS00008KSC" 619 .0007928  1.618535
      "FS00008L07" 619 .1198361  2.837028
      "FS00008KO8" 620  .582642 -.3225676
      "FS00008KSC" 620 .0043706  1.520827
      "FS00008L07" 620  .050663  2.744303
      "FS00008KO8" 621 .0090997  .2766691
      "FS00008KSC" 621 .0107276  1.616162
      "FS00008L07" 621 .0505302  2.890874
      "FS00008KO8" 622 .0074663  .2921792
      "FS00008KSC" 622 .0202638  1.616135
      "FS00008L07" 622 .0598396  2.915028
      "FS00008KO8" 623        .  .3017958
      "FS00008KSC" 623 .0041988  1.609716
      "FS00008L07" 623 .0480914  2.848323
      end
      format %tm date_ym
      
      * use rangestat to get coefficients and manually calculate residuals
      rangestat (reg) lead_in2nav_w ln_nav, interval(date_ym 0 0)
      gen double resid = lead_in2nav_w - (b_cons + b_ln_nav*ln_nav)
      
      * use runby to perform regressions per date_ym and get residuals
      program cs_reg
          cap reg lead_in2nav_w ln_nav
          // exit without error if regress fails for whatever reason
          if _rc exit
          predict double resid_runby, residuals
      end
      runby cs_reg, by(date_ym)
      
      * loop over date_ym to do the same
      levelsof date_ym, local(yms)
      gen double resid_loop = .
      foreach ym of local yms {
          cap reg lead_in2nav_w ln_nav if date_ym == `ym'
          if _rc == 0 {
              predict double res, residuals
              replace resid_loop = res if e(sample)
              drop res
          }
      }
      list date_ym lead_in2nav_w ln_nav reg_nobs b_ln_nav b_cons resid resid_runby resid_loop, sepby(date_ym) noobs
      and the results
      Code:
      . list date_ym lead_in2nav_w ln_nav reg_nobs b_ln_nav b_cons resid resid_runby resid_loop, sepby(date_ym) noobs
      
        +------------------------------------------------------------------------------------------------------------+
        | date_ym   lead_i~w      ln_nav   reg_nobs     b_ln_nav       b_cons        resid   resid_ru~y   resid_loop |
        |------------------------------------------------------------------------------------------------------------|
        |  2011m1          .           .          .            .            .            .            .            . |
        |  2011m1          .           .          .            .            .            .            .            . |
        |------------------------------------------------------------------------------------------------------------|
        |  2011m2          .           .          .            .            .            .            .            . |
        |  2011m2          .           .          .            .            .            .            .            . |
        |------------------------------------------------------------------------------------------------------------|
        |  2011m3          .           .          .            .            .            .            .            . |
        |  2011m3          .           .          .            .            .            .            .            . |
        |------------------------------------------------------------------------------------------------------------|
        |  2011m4          .           .          .            .            .            .            .            . |
        |  2011m4          .           .          .            .            .            .            .            . |
        |------------------------------------------------------------------------------------------------------------|
        |  2011m5   1.290202   -1.609438          1            .            .            .            .            . |
        |  2011m5          .           .          1            .            .            .            .            . |
        |  2011m5          .           .          1            .            .            .            .            . |
        |------------------------------------------------------------------------------------------------------------|
        |  2011m6   .3964209    1.178369          1            .            .            .            .            . |
        |  2011m6          .           .          1            .            .            .            .            . |
        |  2011m6          .           .          1            .            .            .            .            . |
        |------------------------------------------------------------------------------------------------------------|
        |  2011m7   .2737324   -.7218018          3    .03019882    .23129234    .06423762    .06423762    .06423762 |
        |  2011m7   .0009254    1.686963          3    .03019882    .23129234   -.28131123   -.28131123   -.28131123 |
        |  2011m7   .5208364    2.399778          3    .03019882    .23129234     .2170736     .2170736     .2170736 |
        |------------------------------------------------------------------------------------------------------------|
        |  2011m8   .4026174   -.5599073          3   -.09515673    .29798139    .05135706    .05135706    .05135706 |
        |  2011m8   .0007928    1.618535          3   -.09515673    .29798139   -.14317409   -.14317409   -.14317409 |
        |  2011m8   .1198361    2.837028          3   -.09515673    .29798139    .09181702    .09181702    .09181702 |
        |------------------------------------------------------------------------------------------------------------|
        |  2011m9    .582642   -.3225676          3   -.18466627    .45524463    .06783002    .06783002    .06783002 |
        |  2011m9   .0043706    1.520827          3   -.18466627    .45524463   -.17002858   -.17002858   -.17002858 |
        |  2011m9    .050663    2.744303          3   -.18466627    .45524463    .10219856    .10219856    .10219856 |
        |------------------------------------------------------------------------------------------------------------|
        | 2011m10   .0090997    .2766691          3    .01572438    -.0016211    .00637035    .00637035    .00637035 |
        | 2011m10   .0107276    1.616162          3    .01572438    -.0016211   -.01306445   -.01306445   -.01306445 |
        | 2011m10   .0505302    2.890874          3    .01572438    -.0016211    .00669409    .00669409    .00669409 |
        |------------------------------------------------------------------------------------------------------------|
        | 2011m11   .0074663    .2921792          3    .01993497   -.00286116    .00450288    .00450288    .00450288 |
        | 2011m11   .0202638    1.616135          3    .01993497   -.00286116   -.00909264   -.00909264   -.00909264 |
        | 2011m11   .0598396    2.915028          3    .01993497   -.00286116    .00458976    .00458976    .00458976 |
        |------------------------------------------------------------------------------------------------------------|
        | 2011m12          .    .3017958          2            .            .            .            .            . |
        | 2011m12   .0041988    1.609716          2            .            .            .    4.337e-18    4.337e-18 |
        | 2011m12   .0480914    2.848323          2            .            .            .    6.939e-18    6.939e-18 |
        +------------------------------------------------------------------------------------------------------------+
      
      .

      Comment


      • #4
        Thanks, Matthew and Robert.

        Matthew: Your diagnosis was correct.

        Robert: For reasons I don't want to get into here, I decided to use rangestat rather than runby. But it is good to know that this approach is also available.

        Comment

        Working...
        X