Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Pca and Predict

    Hello everyone,

    I have a time series that has 10 short term rates (f1 f2 f3 f4 f5 f6 f7 f8 f9 f10) which I am trying to aggregate into two components (looks like two is ideal from the eigen values).
    The data has a dummy 1 when a specific event occurred and 0 otherwise.
    I have created two group of variables :
    1) the difference f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 (a*) only on days in which the dummy is 1 (the literature uses this and I have created it so that I can compare). (there are 300 obs)
    2) the difference f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 (b*) every day (there are 7000 obs)


    forvalues i = 0/10 {
    if `i' > 1 {
    local j = `i'+1
    gen a`i'=.
    replace a`i' = d.f`i' if event!=0 &
    }
    summ a`i'
    replace a`i' = (a`i' - r(mean))/r(sd)
    } //*normalized*//



    forvalues i = 0/10 {
    if `i' > 1 {
    local j = `i'+1
    gen b`i'=.
    replace b`i' = d.f`i'
    }
    summ b`i'
    replace b`i' = (b`i' - r(mean))/r(sd)
    } //*normiled and created the shocks on events*//


    Then I run a Principle component analysis on them.

    pca a*, comp(2) blank(.3) //*2 comp*//
    predict m1 m2

    pca b*, comp(2) blank(.3) //2 comp///
    predict t1 t2


    The first one generates value only for days of the event (which makes sense) . The second one should generate values for every day but instead it generates them only for the day of the event. WHICH MAKES ABSOLUTELY NO SENSE.

    Does anyone know what could be the issue? Any help is highly appreciated!

    Best,

    Giulia


  • #2
    Your code is confusing and someone else may well suggest a better approach. But to begin with, it appears that in addition to a1-a10 and b1-b10 you also have variables a0 and b0, based on the range of your forvalues loops. We have no idea what a0 and b0 are, but they are being included in your pcas, and if b0 is missing when event==0 then those observations will be excluded from the second pca.

    Comment


    • #3
      Thank you for your comment. However I think my code is very well done. If you can think of a better way.. please be my guest!
      Regarding a0 and b0 they are part of the model, they are rates (think of it as a f11). I forgot to put it there because it's in the middle of 700 lines of code that I have tried to summarized above.

      a0 and bo are not the issue. Thanks anyway.

      Comment


      • #4
        I'm reluctant to try to improve your code, since I now understand it is a summary of 700 lines that we cannot see. I can make two comments to explain why I find the code you displayed confusing (as opposed to the 700 lines you wrote, which may indeed be very well done).

        1) It is not clear what function the following line plays in its two appearances, since the macro j does not appear elsewhere in the displayed code.
        Code:
        local j = `i'+1
        2) Since your results for the a* variables are what you expect, the following isn't a problem, but this line of code is incomplete: when I test it, Stata tells me invalid syntax.
        Code:
        replace a`i' = d.f`i' if event!=0 &

        Comment


        • #5
          I agree with William. However, if you delete the & in

          Code:
           
           replace a`i' = d.f`i' if event!=0 &
          you should be able to run the code. To your point, the second code should indeed generate values for 7000-1 = 6999 obs. When you tsset, do you have 7000 time units? I suspect that the problem may be that you have only 300 observations for f* and (b0/ b1) prior to executing the code. So just before running the first code, check how many observations you have.

          Code:
          sum f*
          sum b*

          Comment

          Working...
          X