Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to impute missing values by using the value recorded during the previous visit?




    Dear all,

    I am trying to imput missing values by using the value recorded during the previous visit. It seems to take a lot of time by hand edit because data is big. Is there anyway to this easily and quickly?

    Data looks as below.


    PHP Code:
    clear
    input byte id str2 visit double glucose int
    (chol sbpdouble Age
    "12" 163   125 60.8
    "28"   .   . 117 60.8
    "36" 215 171 115 60.8
    "40"   .   . 109 60.8
    "16"   .   . 123 60.8
    "48"  99 151 114 60.8
    "24"  95 168 118 60.8
    "72"  88 174 110 60.8
    "78"   .   . 119 60.8
    "12" 108 239 125 57.3
    "24" 123 126 107 57.3
    "16"   .   . 152 57.3
    "24"   .   .   . 72.5
    "12" 146 153 112 72.5
    "16"   .   . 125 72.5
    end 

    Thank you in advance.
    Best wishes,
    Oyun

  • #2
    You can do this:

    Code:
    clear
    input byte id str2 visit double glucose int(chol sbp) double Age
    1 "12" 163   . 125 60.8
    1 "28"   .   . 117 60.8
    1 "36" 215 171 115 60.8
    1 "40"   .   . 109 60.8
    1 "16"   .   . 123 60.8
    1 "48"  99 151 114 60.8
    1 "24"  95 168 118 60.8
    1 "72"  88 174 110 60.8
    1 "78"   .   . 119 60.8
    3 "12" 108 239 125 57.3
    3 "24" 123 126 107 57.3
    3 "16"   .   . 152 57.3
    4 "24"   .   .   . 72.5
    4 "12" 146 153 112 72.5
    4 "16"   .   . 125 72.5
    end 
    
    destring visit, replace
    isid id visit, sort
    
    foreach v of varlist glucose chol sbp {
        replace `v' = `v'[_n-1] if missing(`v')
    }
    This code assumes, and verifies, that id and visit number uniquely identify observations.

    Whether this is a reasonable way to impute missing values really depends on the circumstances, and I won[t pass judgment on it here. But do consider whether the way these variables work in real life makes this last-value-carried-forward approach sensible.

    Minor point: why is visit a string variable. It really should be numeric so that sorting on visit number places the observations in correct chronological order.

    Note also that this approach to imputation does not provide any imputed value to any of a series of missing observations at the start of an individual patient's records.

    Do bear in mind that if you are ultimately planning to do regression analyses with this data, this type of single imputation is deprecated as it is known to produce biased regression coefficients. If the data are missing at random, a multiple imputation approach is better. If the data are not missing at random, then things get even more complicated, with few satisfactory approaches.

    Comment


    • #3
      Dear prof.Schechter,

      Thank you for clear answer and suggestions.
      Data are missing at random, so I think this approach will work for my dataset.

      Comment


      • #4
        Clyde's general advice is excellent as always, but his code is problematic. I doubt that he intended that information be carried between identifiers, which isn't ruled out. So, minimally you should want to insist

        Code:
        foreach v of varlist glucose chol sbp {
            by id: replace `v' = `v'[_n-1] if missing(`v')
        }
        For more discussion, see https://www.stata.com/support/faqs/d...issing-values/ and https://www.statalist.org/forums/for...-interpolation

        Comment


        • #5
          Nick, thanks. Yes, I intended to write what you show in #4 but somehow omitted the crucial -by id:- prefix.

          Comment


          • #6
            Thank you Mr.Cox.

            Comment

            Working...
            X