Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Program/loop help for replacing values of ordinal variables

    Hello everyone,

    I am attempting to write a program/loop for a series of 4 ordinal, variables.

    For each observation, there are 4 variables (call them medial_cortex (MC), lateral_cortex (LC), anterior_cortex (AC), posterior_cortex (PC)) which have a score of 1, 2, 3 or 4.

    However, there are observations in which one, two, or three of the score variables are missing data.

    Ie. Medial_cortex = 2, lateral_cortex = 3, anterior_cortex = . and posterior_cortex = 2

    For each observation in the data set, I would like to take the average of the existing data in order to complete the remaining score data for each observation (I would prefer to avoid a conversation about the validity of replacing missing data in this way, this approach has already been ok'd by my principal investigator).

    The general formula would look like (sum of the values of existing data) / (the number of vars with existing data) = Average

    ie. MC = 2, LC = . AC = 4, PC = . --->Thus the the loop/program would replace LC and PC with 3 ( (2+4)/2 = 3) --> Final set: MC = 2, LC = 3, AC = 4, PC = 3
    ie. MC = 2, LC = 3 AC = 4, PC = . ---> Thus the loop/program would replace PC with 3 ( (2 +3 +4)/3 = 3) --> Final set: MC = 2, LC = 3, AC = 4, PC = 3
    ie. MC = . LC = 3 AC = . PC = . ---> Thus the loop/program would replace MC, AC, and PC with 3 ( 3/1 = 3) --> Final set: MC = 3, LC = 3, AC = 3, PC =3

    My code thus far (very simple as I am pretty stuck on how to proceed). Please scroll to the bottom for example data.

    Thank you all in advance for your help

    Code:
    *******
    * medial_cortex lateral_cortex anterior_cortex posterior_cortex are ordinal variables
    * with values of 1, 2, 3, or 4
    
    
    local cortex_list medial_cortex lateral_cortex anterior_cortex posterior_cortex
    foreach n in `cortex_list' {
    replace `n' = mean `cortex_list' if medial_cortex == . | lateral_cortex == . | anterior_cortex == . | posterior_cortex == .
    }
    Alternatively, I tried a long form of the loop I am trying to accomplish, but I think there are values being changed/replaced incorrectly

    Code:
    replace medial_cortex = (lateral_cortex + anterior_cortex + posterior_cortex)/3 if (medial_cortex == .) & (lateral_cortex != .) & (anterior_cortex !=.) & (posterior_cortex !=.)
    replace lateral_cortex = (medial_cortex + anterior_cortex + posterior_cortex)/3 if (lateral_cortex == .) & (medial_cortex != .) & (anterior_cortex !=.) & (posterior_cortex !=.)
    replace anterior_cortex = (lateral_cortex + medial_cortex + posterior_cortex)/3 if (anterior_cortex == .) & (lateral_cortex != .) & (medial_cortex !=.) & (posterior_cortex !=.)
    replace posterior_cortex = (lateral_cortex + anterior_cortex + medial_cortex)/3 if (posterior_cortex == .) & (lateral_cortex != .) & (anterior_cortex !=.) & (medial_cortex !=.)
    
    
    replace medial_cortex = (anterior_cortex + posterior_cortex)/2 if (medial_cortex == .) & (lateral_cortex == .) & (anterior_cortex !=.) & (posterior_cortex !=.)
    replace lateral_cortex = (anterior_cortex + posterior_cortex)/2 if (lateral_cortex == .) & (anterior_cortex !=.) & (posterior_cortex !=.)
    
    replace anterior_cortex = (medial_cortex + lateral_cortex)/2 if (anterior_cortex == .) & (posterior_cortex == .) & (medial_cortex !=.) & (lateral_cortex !=.)
    replace posterior_cortex = (medial_cortex + lateral_cortex)/2 if (posterior_cortex == .) & (medial_cortex !=.) & (lateral_cortex !=.)
    
    replace medial_cortex = posterior_cortex if medial_cortex == . & lateral_cortex == . & anterior_cortex == . & posterior_cortex !=.
    replace lateral_cortex = posterior_cortex if lateral_cortex == . & anterior_cortex == . & posterior_cortex !=.
    replace anterior_cortex = posterior_cortex if anterior_cortex == . & posterior_cortex !=. &
    
    replace medial_cortex = anterior_cortex if medial_cortex == . & lateral_cortex == . & posterior_cortex == . & anterior_cortex !=.
    replace lateral_cortex = anterior_cortex if lateral_cortex == . & posterior_cortex == . & anterior_cortex !=.
    replace posterior_cortex = anterior_cortex if posterior_cortex == . & anterior_cortex !=.
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int participant_id byte(medial_cortex lateral_cortex anterior_cortex posterior_cortex) float time
    371 1 . 1 .  6
    217 2 2 2 2 12
    350 1 2 2 2  6
    217 2 3 2 3 26
    371 2 2 2 2 26
    371 1 1 2 1 12
    371 2 . 2 . 52
    407 . . . . 12
    407 3 . . 3 26
    217 . 3 . 3 52
    350 4 4 . . 52
    350 . . . . 26
    407 . . . 1  6
    350 2 3 . . 12
    217 . 1 . 2  6
    407 . . . 3 52
    end
    label values medial_cortex medial_cortex_
    label def medial_cortex_ 1 "Fracture line, no callus", modify
    label def medial_cortex_ 2 "Fracture line completely visible, callus present", modify
    label def medial_cortex_ 3 "Fracture line partially visible, callus present", modify
    label def medial_cortex_ 4 "No Fracture line, callus present", modify
    label values lateral_cortex lateral_cortex_
    label def lateral_cortex_ 1 "Fracture line, no callus", modify
    label def lateral_cortex_ 2 "Fracture line completely visible, callus present", modify
    label def lateral_cortex_ 3 "Fracture line partially visible, callus present", modify
    label def lateral_cortex_ 4 "No Fracture line, callus present", modify
    label values anterior_cortex anterior_cortex_
    label def anterior_cortex_ 1 "Fracture line, no callus", modify
    label def anterior_cortex_ 2 "Fracture line completely visible, callus present", modify
    label values posterior_cortex posterior_cortex_
    label def posterior_cortex_ 1 "Fracture line, no callus", modify
    label def posterior_cortex_ 2 "Fracture line completely visible, callus present", modify
    label def posterior_cortex_ 3 "Fracture line partially visible, callus present", modify

  • #2
    Hi Patrick, it looks like what you want is egen, rowmean.

    Also, it made it easier to deal with if I didn't use your labels and kept them displayed as numbers:
    Code:
    sort participant_id time
    
    * Shortening some variable names (to make them easier to list)
    rename medial_cortex mc
    rename lateral_cortex lc
    rename anterior_cortex ac
    rename posterior_cortex pc
    rename participant_id id
    order time, after(id)  // just places the time variable right after id
    
    egen avg_score = rowmean(mc lc ac pc)  // this is the rowmean you were looking for. It automatically ignores missing values.
    
    . list, sepby(id) noobs abbrev(12)
    
      +--------------------------------------------+
      |  id   time   mc   lc   ac   pc   avg_score |
      |--------------------------------------------|
      | 217      6    .    1    .    2         1.5 |
      | 217     12    2    2    2    2           2 |
      | 217     26    2    3    2    3         2.5 |
      | 217     52    .    3    .    3           3 |
      |--------------------------------------------|
      | 350      6    1    2    2    2        1.75 |
      | 350     12    2    3    .    .         2.5 |
      | 350     26    .    .    .    .           . |
      | 350     52    4    4    .    .           4 |
      |--------------------------------------------|
      | 371      6    1    .    1    .           1 |
      | 371     12    1    1    2    1        1.25 |
      | 371     26    2    2    2    2           2 |
      | 371     52    2    .    2    .           2 |
      |--------------------------------------------|
      | 407      6    .    .    .    1           1 |
      | 407     12    .    .    .    .           . |
      | 407     26    3    .    .    3           3 |
      | 407     52    .    .    .    3           3 |
      +--------------------------------------------+
    
    * So from here, you would want to replace mc & ac==1.5 for that 1st observation
    * NOTE: you might want to create a variable that indicates which variables were originally missing (and hence will be replaced with the avg)
    foreach v of varlist mc lc ac pc {
    gen `v'_miss = (`v' ==.)
    replace `v' = avg_score if `v'==.
    }
    
    . list, sepby(id) noobs abbrev(12)
    
      +---------------------------------------------------------------------------------------+
      |  id   time    mc   lc    ac    pc   avg_score   mc_miss   lc_miss   ac_miss   pc_miss |
      |---------------------------------------------------------------------------------------|
      | 217      6   1.5    1   1.5     2         1.5         1         0         1         0 |
      | 217     12     2    2     2     2           2         0         0         0         0 |
      | 217     26     2    3     2     3         2.5         0         0         0         0 |
      | 217     52     3    3     3     3           3         1         0         1         0 |
      |---------------------------------------------------------------------------------------|
      | 350      6     1    2     2     2        1.75         0         0         0         0 |
      | 350     12     2    3   2.5   2.5         2.5         0         0         1         1 |
      | 350     26     .    .     .     .           .         1         1         1         1 |
      | 350     52     4    4     4     4           4         0         0         1         1 |
      |---------------------------------------------------------------------------------------|
      | 371      6     1    1     1     1           1         0         1         0         1 |
      | 371     12     1    1     2     1        1.25         0         0         0         0 |
      | 371     26     2    2     2     2           2         0         0         0         0 |
      | 371     52     2    2     2     2           2         0         1         0         1 |
      |---------------------------------------------------------------------------------------|
      | 407      6     1    1     1     1           1         1         1         1         0 |
      | 407     12     .    .     .     .           .         1         1         1         1 |
      | 407     26     3    3     3     3           3         0         1         1         0 |
      | 407     52     3    3     3     3           3         1         1         1         0 |
      +---------------------------------------------------------------------------------------+
    Last edited by David Benson; 20 Dec 2018, 16:12.

    Comment


    • #3
      Hi David,

      This is excellent. That is a great suggestion on creating the variable to indicate which observations had missing data.

      I sincerely appreciate the help.

      Thank you.

      Comment

      Working...
      X