Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculate rolling mean and SD at calendar day level - Panel data

    Hi All,
    We have data with repeated measures for the same individual. Each row includes an Id (static), admission date (static), glucose value (dynamic), BG date and time (dynamic). We would like to summarize these glucose data at the calendar day level by creating new variables for rolling BG mean and rolling SD. The data is rolling data. Each row considers the index for the next row (rolling summary data [mean and SD] up to and including the index value). Here is an example of the new variable that I am trying to crate.
    id Admission date Hospital day Rolling BG mean Rolling BG SD
    1 14dec2015 1
    1 14dec2015 2
    1 14dec2015 3
    1 14dec2015 4

    I would appreciate if you can suggest a code to achieve this. Collapse command could not generate rolling mean and SD

    STATA 18


    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long id float(admt_d bg_value bg_result_dt)
    1 20436  85 1.7656978e+12
    1 20436 150 1.7657088e+12
    1 20436  90  1.765776e+12
    1 20436  93 1.7657785e+12
    1 20436  79  1.765817e+12
    1 20436  66  1.765836e+12
    1 20436 167  1.765839e+12
    1 20436  94  1.765867e+12
    1 20436 103 1.7658876e+12
    1 20436  91 1.7659047e+12
    1 20436 103  1.765921e+12
    1 20436 102 1.7659353e+12
    1 20436  99  1.766008e+12
    1 20436  89 1.7660422e+12
    1 20436  86 1.7660476e+12
    1 20436  73  1.766077e+12
    1 20436  77  1.766095e+12
    1 20436  70  1.766134e+12
    1 20436  70 1.7661367e+12
    1 20436  74 1.7661508e+12
    2 20439  76 1.7659607e+12
    2 20439 102 1.7659817e+12
    2 20439 109 1.7660048e+12
    2 20439  84 1.7660304e+12
    2 20439  87 1.7660356e+12
    2 20439 101 1.7660365e+12
    2 20439  88  1.766059e+12
    2 20439  90  1.766069e+12
    2 20439  80  1.766082e+12
    2 20439  92 1.7661053e+12
    2 20439  90 1.7661316e+12
    2 20439  97 1.7661346e+12
    2 20439  94 1.7661508e+12
    2 20439 121 1.7661648e+12
    2 20439  96  1.766179e+12
    2 20439  98   1.76622e+12
    2 20439  94 1.7662204e+12
    2 20439  96 1.7662326e+12
    2 20439  89 1.7662508e+12
    2 20439  98 1.7662673e+12
    2 20439  98  1.766295e+12
    2 20439  91  1.766301e+12
    2 20439  96   1.76632e+12
    3 20592 117  1.779192e+12
    3 20592  98  1.779254e+12
    3 20592  84 1.7793442e+12
    3 20592  95 1.7794323e+12
    3 20592  86 1.7795222e+12
    4 20425 158  1.764788e+12
    4 20425 113 1.7648286e+12
    4 20425 110 1.7649252e+12
    4 20425 141 1.7650086e+12
    4 20425 105 1.7650862e+12
    4 20425 100  1.765173e+12
    5 20450 201 1.7669354e+12
    5 20450 214 1.7669356e+12
    5 20450 196 1.7669595e+12
    5 20450 212 1.7669867e+12
    5 20450 183 1.7669936e+12
    5 20450 185 1.7670084e+12
    5 20450 281 1.7670293e+12
    5 20450 166  1.767044e+12
    5 20450 229  1.767081e+12
    5 20450 219 1.7670827e+12
    5 20450 177 1.7670928e+12
    end
    format %td admt_d
    format %tc bg_result_dt

  • #2
    Sorry, but I don't understand what kind of rolling mean you want to calculate. To specify a rolling statistic you have to state what the window is, that is, how do you identify the first and last observations to be included in the calculation. For example you might have a window of -3 to +2, meaning that for each observation you calculate the mean value for that observation as well as the three immediately preceding and the two immediately following. Or a window of -4 to -1 would mean the mean of the four observations immediately preceding. I cannot discern from what you wrote what window you want for your rolling means. I imagine you were trying to describe that when you wrote
    Each row considers the index for the next row (rolling summary data [mean and SD] up to and including the index value).
    but I just found that sentence confusing.

    Comment


    • #3
      I apologize for any confusion. To clarify, the first observation refers to the first blood glucose (BG) value on the admission's day 1, while the last observation would be the last BG value on a specific day. For instance, considering the data provided in my previous post, the individual with ID 1 exhibits 2 BG values on day 1 (85 and 150). The average BG value (with standard deviation) on day 1 is 117.5 (45.9).

      On day 2, there are 5 BG values. When calculating the average and standard deviation for day 2, I intend to include all values from day one up to the last value of day 2.

      The objective is to transform the data from the BG levels, as demonstrated in Table 1, into a daily level, incorporating rolling averages and standard deviations, as exemplified in Table 2.

      Table 1.
      d admt_d bg_value bg_result_dt
      1 14-Dec-15 85 12/14/2015 7:36
      1 14-Dec-15 150 12/14/2015 10:39
      1 14-Dec-15 90 12/15/2015 5:22
      1 14-Dec-15 93 12/15/2015 6:02
      1 14-Dec-15 79 12/15/2015 16:46
      1 14-Dec-15 66 12/15/2015 22:01
      1 14-Dec-15 167 12/15/2015 22:46
      1 14-Dec-15 94 12/16/2015 6:38
      1 14-Dec-15 103 12/16/2015 12:19
      1 14-Dec-15 91 12/16/2015 17:05
      1 14-Dec-15 103 12/16/2015 21:34
      1 14-Dec-15 102 12/17/2015 1:34
      1 14-Dec-15 99 12/17/2015 21:47
      1 14-Dec-15 89 12/18/2015 7:17
      1 14-Dec-15 86 12/18/2015 8:46
      1 14-Dec-15 73 12/18/2015 16:56
      1 14-Dec-15 77 12/18/2015 21:55
      1 14-Dec-15 70 12/19/2015 8:48
      1 14-Dec-15 70 12/19/2015 9:32
      1 14-Dec-15 74 12/19/2015 13:26

      Table 2.
      id admt_d hospital_day mean sd
      1 14-Dec-15 1 117.5 45.9
      1 14-Dec-15 2 104.2 38.3
      1 14-Dec-15 3 101.9 30
      1 14-Dec-15 4 101.6 27.4
      1 14-Dec-15 5 96.8 25.6
      1 14-Dec-15 6 93 25.3

      Comment


      • #4
        OK,now I understand what you want. What you are describing is not a rolling mean. It is a running mean, or it is sometimes also called a recursive rolling mean.

        Code:
        rangestat (mean) running_bg_mean = bg_value ///
            (sd) running_bg_sd = bg_value, by(id) interval(bg_result_dt . 0)
        Now, although you don't quite say so, it also seems you are primarily interested in these values as of the last observation on each date. If you wish to eliminate the others, you can follow the above with:
        Code:
        by id (bg_result_dt), sort: replace running_bg_mean = . if _n < _N
        by id (bg_result_dt): replace running_bg_sd = . if _n < _N
        -rangestat- is written by Robert Picard, Nick Cox, and Roberto Ferrer. It is available from SSC.

        Comment

        Working...
        X