Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • 1-month lag cross-sectional regression

    Code:
    Example generated by -dataex-. To install: ssc install dataex
    clear
    input str6 date int id float(volatility return Time)
    "Jan-91"   1 .0076515  .0080953 .
    "Jan-91"   2 .0082569  .0050203 .
    "Jan-91"   3 .0116118  -.110187 .
    "Jan-91"   4  .007547  .0189273 .
    "Jan-91"  12 .0085447  .0023504 .
    "Jan-91"  13 .0088329  .0281606 .
    "Jan-91"  17  .010967 -.0018181 .
    "Jan-91"  25 .0093389    .00644 .
    "Jan-91"  26 .0088709 -.0155422 .
    "Jan-91"  27  .011748  .0205174 .
    "Jan-91"  28 .0098117  .0048176 .
    "Jan-91"  29 .0108828  -.005227 .
    "Jan-91"  30 .0100979 -.0044575 .
    "Jan-91"  31 .0143006  .0110809 .
    "Jan-91"  40 .0101437 -.0458095 .
    "Feb-91" 106 .0142047  .1218685 .
    "Feb-91" 107 .0033309  .0998751 .
    "Feb-91" 109 .0070853  .1022816 .
    "Feb-91" 112 .0020066  .1269131 .
    "Feb-91" 116 .0081126  .0239336 .
    "Feb-91" 117 .0027221  .1054625 .
    "Feb-91" 118  .007132  .0922486 .
    "Feb-91" 119 .0037425   .111814 .
    "Feb-91" 123 .0041721  .1069812 .
    "Feb-91" 124  .021932  .1650661 .
    "Feb-91" 125  .005186  .1086161 .
    "Feb-91" 126 .0051869  .1417216 .
    "Feb-91" 209 .0088042  .0898837 .
    "Feb-91" 212  .008878  .0804817 .
    "Feb-91" 213 .0033121  .1345911 .
    "Feb-91" 214 .0130733  .1257204 .
    "Feb-91" 215 .0079939  .1178345 .
    "Feb-91" 216 .0040007  .0999595 .
    "Feb-91" 217 .0068929  .1253842 .
    end
    format %tmCCYY!mNN Time
    here is my data.

    I want to do a cross-sectional (id) regression for each month. The regression model: return(t)=a+b*volatility(t-1)+c

    I also need to do a loop, as I have 268 month

    Then, I also need to predict the value of b(coefficient) for each month, and the t-statistic and p-value of coefficient for each month.

    finally, I need to calculate the mean of b as well as t-statistic. "(b1+b2+...+b268)/268" the average value is easy to do, but the t-statistic of the mean value of b is not. I am thinking whether STATA can calculate the t-statistic or not? otherwise, I have to calculate manually.

    I try to write the code:

    gen time = monthly(date, "M19Y")
    format time %tm

    tsset id time

    egen group=group(time)

    forvalue i=1(1)268 {
    reg return L.volatility i.month if group ==`i'
    predict ???
    }

    many thanks Clyde Schechter for previous help. but the regression code doesn't work and I have no idea how to predict coefficient, especially the t-statistic and p-value.

    I am really appreciate if someone could help me to figure out the code.

    Many thanks again!!!



  • #2
    I think the following will do it:

    Code:
    gen time = monthly(date, "M19Y")
    format time %tm
    drop date Time
    xtset id time
    gen lag_volatility = L1.volatility
    
    capture program drop regress_one_month
    program define regress_one_month
        regress return lag_volatility
        tempname M
        matrix `M' = r(table)
        gen b = `M'[1,1]
        gen t = `M'[3,1]
        gen p = `M'[4,1]
        exit
    end
    
    runby regress_one_month, by(time) verbose
    Notes:

    1. You will need to install -runby-, written by Robert Picard and me, from SSC
    2. The data you show in your example has no situation where the same id has data in two months, so the lagged volatility is always missing, consequently this code when run with your example data produces no results and simply gives you error messages that there are no observations. Presumably in your real data, you actually have multiple months of observations on the same firms, and in that case this should give you what you want.

    To calculate the mean value of the coefficient, b, and get a t-test of the null hypothesis that the mean equals zero you can run:

    Code:
    ttest b = 0
    after the above.​​​​​​​

    Comment


    • #3
      Dear Clyde,

      Many thanks for you kind reply.

      before the command of "capture", it works very well. BTW, because some month missing, so I use
      Code:
      tsfill
      before
      Code:
      gen lag_volatility=L1.volatility
      but for the code:
      Code:
      capture program drop regress_one_month
      program define regress_one_month
          regress return lag_volatility
          tempname M
          matrix `M' = r(table)
          gen b = `M'[1,1]
          gen t = `M'[3,1]
          gen p = `M'[4,1]
          exit
      end
      I didn't get any results in the STATA. They also didn't show any error for this code.

      Then, I run
      Code:
      ssc install runby
      runby regress_one_month, by(time) verbose
      It doesn't work:

      variable return not found
      variable return not found
      variable return not found
      variable return not found
      variable return not found
      variable return not found
      variable return not found
      variable return not found
      variable return not found
      variable return not found

      --------------------------------------
      Number of by-groups = 294
      by-groups with errors = 294
      by-groups with no data = 0
      Observations processed = 54,326
      Observations saved = 0
      --------------------------------------



      PLUS, I am not very understand the code capture and program, I check the information in "STATA help" as well as Google, still confused.

      1. "The capture command allows you to give a command that might otherwise fail and continue anyway. In this case, the command is program drop mergedata. If there is no program in memory already called "mergedata", the command to drop the program from memory will fail. By capturing the error message from that situation, you can continue working."

      Is it means whatever mistake in my data, the program still working? If so, can I use this command in other regression or arch code?

      2.
      Code:
      tempname M
          matrix `M' = r(table)
          gen b = `M'[1,1]
          gen t = `M'[3,1]
          gen p = `M'[4,1]
          exit
      for this part, I understand the command of tempname and matrix, but I didn't find the code like 'm'[1,1], what does the number means? If this command means to display the results as a matrix style, how can I save it? because the cross-sectional regression will repeat about 248 times, the prior results will be missing, will they?

      3. "runby loops over data by-groups. At each pass, the data in memory is filled with a by-group's observations and program_name is executed. What's left in memory when program_name terminates is considered results and stored.
      Once all by-groups have been processed, all stored results are combined and replace the data in memory."
      Is this the reason that after run the -runby-, all my data in the Data Editor is missing. They are should be replaced by my results, but I didn't get results, is it the reason?

      I am really sorry for so many questions. I really appreciate for your kind patience!

      BIG THX!

      Comment


      • #4
        No, the -capture program drop regress_one_month- is not a problem. When developing a program like this, sometimes you make a mistake and have to fix it. But you can't just re-run the definition of the program without first dropping the old version. And if you just drop the version every time, you will get an error the first time, as no previous version exists. By using the -capture- command, it allows you to not worry about it: if a previous version exists it is dropped, if no previous version exists, nothing happens.

        As for the regressions not running, read the error message--it tells you exactly what the problem is. Your data doesn't have a variable called return. Your example data did have a variable by that name, and the code I showed ran just fine there. But apparently in your real data, the variable you are interested in has some different name. So you either have to rename that variable to return, or you have to change the -regress- command to use the actual name of the variable.

        for this part, I understand the command of tempname and matrix, but I didn't find the code like 'm'[1,1], what does the number means?
        First, it's `M'[1,1]. If you use a lower case m it won't work.
        Since you understand the -matrix- command, you will know that `M' contains the matrix r(table) that is left behind by -regress-. To get a general sense of what r(table) looks like, run this code:
        Code:
        sysuse auto, clear
        regress price mpg headroom
        matrix list r(table)
        `M' will contain similar information relevant to your regression. You will see that the first row gives coefficients, the third row gives the t-statistics, and the fourth row gives the p-values. The first column gives information relevant to the first predictor.

        If this command means to display the results as a matrix style, how can I save it?
        Don't bother, there's no point to saving it. -regress- returns a matrix r(table) with the coefficients, standard errors, test statistics, p-values, and confidence limits. All we're doing here is extracting the information we want from that matrix. Saving the full matrix would just waste memory. Also, matrices are really an inconvenient way of saving results unless you plan to do some linear algebra with them.

        because the cross-sectional regression will repeat about 248 times, the prior results will be missing, will they?
        No. The way -runby- works, it will execute the program 248 times, once for each value of time. The results generated each time are built-up by serially appending new results to the old ones. So nothing gets lost along the way. If your final results have any missing values it will be because the regression did not run successfully, or ran but could not calculate that particular statistic.

        Is this the reason that after run the -runby-, all my data in the Data Editor is missing. They are should be replaced by my results, but I didn't get results, is it the reason?
        The reason you don't have any results is because none of the regressions ran. And none of the regressions ran because your data does not contain a variable named return, which is required in the regression (as explained above.) You will have your original results plus the regression coefficient, t statistic and p-value for every time for which a successful regression ran. To see what it looks like, just run the code as I wrote it with your example data: I tested it before and it did exactly what you were looking for. By changing the variable names in the data without making the corresponding change in the code, you broke it.

        BTW, because some month missing, so I use
        Code:
        tsfill
        before
        Code:
        gen lag_volatility=L1.volatility
        This is harmless, but also unnecessary. The missing months don't matter at all, and adding in new observations for them that have no actual data for the regression variables changes nothing.

        Comment


        • #5
          Dear Clyde,

          I see~~~ I am so stupid. I didn't realize that I forgot to change the variable name while doing regression. Many many THX!!!

          Comment

          Working...
          X