Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Replacing missing value with average of selected responses

    Hello, I am trying to replace a missing value with the average several other responses. I am running the code below which does not end up replacing the missing values in pt_bbq6_rc_mis for some reason. I realize this is a simple coding error somewhere but I would sure appreciate some assistance. Thanks!

    Jake

    Code:
    *
    capture drop pt_bbq6_rc_mis // replacing missing value with average or recoded response
    gen pt_bbq6_rc_mis = pt_bbq6_rc
    replace pt_bbq6_rc_mis = (pt_bbq1_rc + pt_bbq2_rc + pt_bbq3_rc + pt_bbq6_rc + ///
    pt_bbq8_rc + pt_bbq10_rc + pt_bbq12_rc + pt_bbq12_rc + pt_bbq14_rc ) / 8 if pt_bbq6_rc_mis == .
    order pt_bbq6_rc_mis, before(pt_bbq6_rc)

    the missing values on pt_bbq6_rc_mis are not being replaced. see daetex below for record_id 101 as an example.


    clear
    input int record_id float(pt_bbq1_rc pt_bbq2_rc pt_bbq3_rc pt_bbq4_rc pt_bbq5_rc pt_bbq6_rc_mis pt_bbq6_rc pt_bbq7_rc pt_bbq8_rc pt_bbq9_rc pt_bbq10_rc pt_bbq11_rc pt_bbq12_rc pt_bbq13_rc_mis pt_bbq13_rc pt_bbq14_rc)
    51 4 5 1 5 1 3 3 5 5 3 3 5 1 1 1 1
    52 5 5 5 5 2 3 3 5 5 2 5 3 5 2 2 3
    54 4 5 3 4 1 5 5 5 5 2 5 5 3 4 4 2
    55 5 3 2 5 2 2 2 5 5 2 4 5 1 2 2 2
    56 3 4 2 4 1 5 5 4 4 4 4 3 2 2 2 2
    58 5 4 3 4 3 2 2 3 3 3 2 3 3 3 3 2
    59 1 2 1 3 1 1 1 5 5 1 3 3 3 2 2 3
    61 4 5 1 4 2 2 2 5 3 2 5 5 4 3 3 3
    63 5 5 5 5 1 3 3 5 5 3 3 4 3 2 2 3
    67 5 3 5 5 3 2 2 5 5 3 3 5 3 3 3 3
    69 4 4 4 4 2 3 3 4 5 3 3 4 3 4 4 3
    71 4 5 4 4 2 2 2 3 5 3 4 4 3 3 3 4
    72 3 3 2 2 3 2 2 5 5 2 2 5 2 3 3 2
    73 3 1 1 3 1 1 1 2 1 4 1 2 1 2 2 1
    74 2 4 2 3 1 2 2 5 4 2 4 5 2 5 5 2
    75 4 4 3 4 2 2 2 5 4 3 4 4 4 4 4 3
    76 2 5 2 4 1 4 4 5 5 2 3 5 4 3 3 4
    77 4 5 3 4 1 2 2 4 4 3 4 3 4 3 3 3
    79 3 2 2 5 1 2 2 5 3 3 3 5 2 3 3 1
    80 1 5 1 3 2 3 3 5 3 2 5 5 1 4 4 1
    82 5 3 1 5 3 3 3 1 1 3 1 1 3 3 3 3
    84 5 5 5 5 2 3 3 5 5 3 5 5 5 2 2 5
    85 5 5 3 4 1 4 4 5 5 3 5 5 3 5 5 4
    86 3 4 3 3 2 2 2 5 5 3 4 4 1 2 2 3
    87 5 5 2 3 1 5 5 5 5 2 5 5 2 2 2 2
    88 4 4 3 5 3 2 2 4 3 5 4 4 4 3 3 2
    89 3 4 4 4 2 2 2 5 4 4 4 2 1 3 3 1
    90 4 3 2 4 2 3 3 5 5 3 4 5 3 2 2 2
    91 5 5 3 5 1 3 3 3 5 3 5 5 4 5 5 3
    92 5 2 3 4 2 2 2 3 3 2 3 5 3 4 4 2
    94 5 5 5 5 1 3 3 5 5 3 5 5 5 3 3 3
    95 4 3 2 4 1 3 3 4 5 2 4 4 4 5 5 3
    96 5 5 5 5 1 4 4 5 5 3 5 5 5 5 5 5
    97 5 4 4 4 1 4 4 5 4 2 4 4 4 3 3 4
    98 3 4 1 4 2 1 1 4 5 1 3 5 2 2 2 1
    100 4 4 2 4 2 2 2 4 3 3 4 4 2 2 2 2
    101 5 5 5 5 1 . . 5 5 2 5 5 5 3 3 4
    102 4 3 2 4 2 4 4 5 3 2 5 3 1 3 3 2
    103 5 4 2 5 3 1 1 2 4 2 2 5 4 2 2 4
    104 5 5 5 5 1 3 3 5 4 4 4 5 4 5 5 4
    105 5 4 3 5 2 4 4 3 3 3 2 5 2 3.5 . 3
    107 5 3 4 4 2 2 2 3 3 3 4 2 3 2 2 2
    108 4 3 2 5 2 3 3 4 2 3 3 2 1 3 3 1
    109 4 5 2 5 1 2 2 5 5 1 5 5 2 2 2 2
    112 4 4 4 4 2 2 2 3 2 3 3 4 4 2 2 3
    113 4 1 5 5 2 1 1 4 2 4 4 5 4 4 4 3
    114 2 2 2 2 4 3 3 5 5 3 4 2 2 4 4 2
    116 4 3 4 4 2 4 4 3 4 3 3 4 3 2 2 2
    end
    [/CODE]

  • #2
    There are a few things about your goals that I don't understand, but that said, I'll answer your question as you asked it. Your replace command is being foiled by the missing values in row 101. Using the "varname1 + varname2 + varnamen" syntax, Stata will refuse to provide a summation across the range of varnames for any row for which any of the varnames is missing. You can use the egen...rowmean syntax to avoid this problem. Consult help egen for a primer.

    Comment


    • #3
      Actually, it is being replaced. By another missing value!

      This happens because your sum in the -replace- command includes mention of the variable pt_bbq6_rc, and in the observation you highlighted, that variable has a missing value. So the entire sum evaluates to missing value. Missing value + anything = missing value in Stata.

      The solution to this problem is to use -egen- with the -rowmean()- function. See -help egen-. This will also avoid another possible error in your code: I think you miscounted the number of terms to divide by and would get the wrong average anyway.

      And there is yet another error in that command, I think. You add pt_bbq12_rc twice. If that isn't a mistake and you really want to count it twice, it would be simpler to just use 2*pt_bbq12_rc as a single term.

      Comment


      • #4
        Thank you very much for the egen suggestion. That is what I needed.

        Comment

        Working...
        X