Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • error term/sigma_e for every/single observation(s)?

    Dear STATA community,

    I almost feel a bit ashamed to ask this, but I am very new to STATA and many things are still a blackbox for me. I found this forum and its post so far veeeeeeery helpful!

    So, for a research paper on ETFs and their tracking ability I have to use STATA to get the error term values of the market model. What I have to calculate is the following:
    Click image for larger version

Name:	TE3.PNG
Views:	1
Size:	24.8 KB
ID:	1590447

    So the focus is on the 'epsilon' (error term). Maybe here good to mention is that I have 16 ETFs with daily (return) data for somewhat between 8 and 13 years (depending on the ETF). Their identifier is "TickerSymbol1".

    I got so far that I ran regressions for each ETF using commands like:

    xtset TickerSymbol1 date
    panel variable: TickerSymbol1 (unbalanced)
    time variable: date, 03jan2007 to 31dec2019, but with gaps
    delta: 1 day

    xtreg Dailyreturnetfin Dailyreturnindexin if TickerSymbol1 == 1, fe

    Click image for larger version

Name:	reg.PNG
Views:	1
Size:	48.5 KB
ID:	1590448


    so I know that basically the sigma_e here seems to be the tracking error. However, this is basically over the entire period.

    Now in order to run regressions later on to find potential determinants of it (where the TE becomes the Y), I will need to have at least monthly values. And here I am super stuck now. Because basically I would then need "sigma_e" for each observation so that I could calculate the Tracking Error according to the formula above but then for each month.

    I also had been trying this:

    predict stdp1 if TickerSymbol1 == 1, stdp

    Because I thought that this would provide me with the standard errors for each observation/day. But after all the posts and videos I have tried to read and watch I am confused now what the right approach is in my matter.

    Sorry about the long post, but it may be more clear then.

    Any help is greatly appreciated since it will anyways help me making a step further

    Thanks and best,
    Marina

  • #2
    So, the first mistake is running a separate regression for each ETF. The equation you show has _it_ subscripts in it, so the intention is that all of the ETFs will be included in the model. Indeed, a clue that something was wrong is that your -xtreg- output has no result for sigma_u or rho. That's because you have only one ETF in it. So first go back and rerun that same -xtreg- command without the -if- condition.


    Then run -predict error_term, e- and the variable error_term will be created and will contain the epsilon_it for each observation.

    That said, at least out of context, the second equation you show, the one for TE3 is mathematically incoherent. There is no definition given for epsilonbar_it. And it is hard to imagine what definition there could be for it. Given that this is all inside a summation from t = 1 to n, I would expect either the i or t subscript to be missing (i.e. averaged over) in that formula--the bar over the epsilon suggests it is some kind of mean, but the mean of which epsilons is not identified. Even if that is fixed, the summation removes the t subscript and leaves a statistic that is still defind separately for each i. Yet the left hand side, TE3, lacks an i subscript. It is unclear how the i gets removed from this. Probably there is a second summation that is missing, but where? Anyway, before you try to code this incoherent formula, I suggest you look more deeply into what it is supposed to be. Perhaps this is just a misrepresentation of some standard formula in finance/economics and somebody else reading this thread will recognize it and clarify the situation. If not, then you should ask a colleague in your discipline for help figuring out what this is supposed to be. Because as written that second equation is just a meaningless jumble of symbols.

    Comment


    • #3
      Thank you very very much for your response Clyde Schechter - I already thought that it was going wrong. I wasn't aware that I should just run the regression for all ETFs at the same time according to this I was convinced I need each ETF separately (just some background information):
      Click image for larger version

Name:	background TE.PNG
Views:	1
Size:	134.0 KB
ID:	1590500


      So I run the regression without the -if- condition and after that "-predict error_term, e-" which I assume is what I already did here but then also without the "-if- condition" again?
      Click image for larger version

Name:	error term.PNG
Views:	1
Size:	5.4 KB
ID:	1590501



      Regarding your comment on the second equation I think indeed this is very likely then that I found an unreliable source since I was looking for ways how to calculate the tracking error (TE 3 as in method 3) then on a monthly basis. However, if I then rethink the steps, then it seems to me that if I have the epsilon_it for each observation then I can just use those, add them up and avg them monthly.

      Thank you very much again!



      Comment


      • #4
        Marina:
        as an aside to Clyde's excellent explanations, with your first code you basically ran an OLS on observation related to panel #1, as you can see in the following toy-example:
        Code:
        . use "https://www.stata-press.com/data/r16/nlswork.dta"
        (National Longitudinal Survey.  Young Women 14-26 years of age in 1968)
        
        . xtreg ln_wage c.age##c.age if idcode==1, fe
        
        Fixed-effects (within) regression               Number of obs     =         12
        Group variable: idcode                          Number of groups  =          1
        
        R-sq:                                           Obs per group:
             within  = 0.8534                                         min =         12
             between =      .                                         avg =       12.0
             overall = 0.8534                                         max =         12
        
                                                        F(2,9)            =      26.20
        corr(u_i, Xb)  =      .                         Prob > F          =     0.0002
        
        ------------------------------------------------------------------------------
             ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
                 age |   .3640136   .1135175     3.21   0.011     .1072192    .6208079
                     |
         c.age#c.age |  -.0053742   .0020739    -2.59   0.029    -.0100656   -.0006828
                     |
               _cons |  -3.611333   1.485875    -2.43   0.038    -6.972616   -.2500499
        -------------+----------------------------------------------------------------
             sigma_u |          .
             sigma_e |  .22630528
                 rho |          .   (fraction of variance due to u_i)
        ------------------------------------------------------------------------------
        F test that all u_i=0: F(0, 9) = .                           Prob > F =      .
        
        . reg ln_wage c.age##c.age if idcode==1
        
              Source |       SS           df       MS      Number of obs   =        12
        -------------+----------------------------------   F(2, 9)         =     26.20
               Model |  2.68353306         2  1.34176653   Prob > F        =    0.0002
            Residual |    .4609267         9  .051214078   R-squared       =    0.8534
        -------------+----------------------------------   Adj R-squared   =    0.8208
               Total |  3.14445976        11  .285859979   Root MSE        =    .22631
        
        ------------------------------------------------------------------------------
             ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
                 age |   .3640136   .1135175     3.21   0.011     .1072192    .6208079
                     |
         c.age#c.age |  -.0053742   .0020739    -2.59   0.029    -.0100656   -.0006828
                     |
               _cons |  -3.611333   1.485875    -2.43   0.038    -6.972616   -.2500499
        ------------------------------------------------------------------------------
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Looking at this more carefully, I see that I probably misdirected you in #2. The first equation there is not what I originally perceived it to be. Frankly, I do not know what to make of it: the coefficient beta_it implies a separate beta for each separate observation--which would never be estimable! However, equation (4) in the screenshot from #3 makes sense. It does call for a separate regression for each i (firm). So we can write a little loop to do that. And the error term is calculated with -predict, e-, not -predict, stdp-.

          Code:
          levelsof TickerSymbol1, local(tss)
          gen error_term = .
          foreach ts of local tss {
              regress Dailyreturnetfin Dailyreturnindexin if TickerSymbol1 == `ts'
              predict error, e
              replace error_term = error if TickerSymbol1 == `ts'
              drop error
          }
          will get you all your epsilon_it's. But I still don't see how to progress from there.

          Comment


          • #6
            Dear Clyde Schechter & Carlo Lazzaro - thank you very much for your help!!

            Comment

            Working...
            X