Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculating variance in stata

    Hi,

    I have followed Clyde's code here to input variance and eventually standard deviation in my data:

    Code:
    // CALCULATE SOME RUNNING TOTALS OF RET AND RET^2*

    by company_id (bcal_date), sort: gen sum_ret = sum(return)
    by company_id (bcal_date): gen sum_ret_sq = sum(return^2)

    // AND A RUNNING COUNT OF NON MISSING OBSERVATIONS*
    by company_id (bcal_date): gen int n_obs = sum(!missing(return))

    sort company_id bcal_date
    // NOW CALCULATE RUNNING STANDARD DEVIATIONS
    gen variance = (L1.sum_ret_sq - L3047.sum_ret_sq)/(L1.n_obs-L3047.n_obs)-((L1.sum_ret - L3047.sum_ret)/(L1.n_obs-L3047.n_obs))^2
    gen sd = sqrt(variance)


    However, I cannot seem to get the L1 numbers correct. I am unsure which number to place here, I used the largest value of n_obs, which is 3047 for me, but it does not work. Anyone have any advice? It would be much appreciated!

  • #2
    To get a panelwise SD, I would use egen, sd(). As you are really interested in SD, there is no need to square to get the variance. .

    To get running SD, I would tend to reach for rangestat from SSC or rolling.

    If 3047 is your panel length, the result is missing for L3047. Consider that if 2 were your sample size, then L1 yields the value for observation 1 within observation 2 and L2 yields missing

    Comment


    • #3
      Hi Nick,

      Thanks so much for your response.

      My plan after finding the standard deviation is to estimate a garch model. Would it be okay to just have the standard deviation to do this? I apologise if this comes across as a sillyquestion, I am quite new to stata.

      Also, would you haveany suggestiong for the code needed for the rolling/rangestat command?

      Thank you!

      Comment


      • #4
        Hi Nick,

        I wanted to ask if you could explain further what you meant about 3047 being missing. Would the code need to be:

        Code:
        gen variance = (L1.sum_ret_sq-L2.sum_ret_sq)/(L1.n_obs-L2.n_obs)-((L1.sum_ret - L2.sum_ret)/(L1.n_obs-L2.n_obs))^2

        I tried the rangestat command for sd, and it gave me slightly different numbers to the few that the above code produced, so potentially it is better to stick with this code?



        Comment

        Working...
        X