Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Fill in the missing value

    Hello,
    I want to ask about how to fill in the missing values in my variables without having to do it one by one. My data should look like this.
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double pid07 str2 cd01type double cd01 str8 hhid07 str10 pidlink
    1 "A" 3 "0012200" "001220001"
    1 "B" 3 "0012200" "001220001"
    1 "C" 3 "0012200" "001220001"
    1 "D" 3 "0012200" "001220001"
    1 "E" 3 "0012200" "001220001"
    1 "F" 3 "0012200" "001220001"
    1 "G" 3 "0012200" "001220001"
    1 "H" 3 "0012200" "001220001"
    1 "I" 3 "0012200" "001220001"
    2 "A" 3 "0012200" "001220002"
    2 "B" 3 "0012200" "001220002"
    2 "C" 3 "0012200" "001220002"
    2 "D" 3 "0012200" "001220002"
    2 "E" 3 "0012200" "001220002"
    2 "F" 3 "0012200" "001220002"
    2 "G" 3 "0012200" "001220002"
    2 "H" 3 "0012200" "001220002"
    2 "I" 3 "0012200" "001220002"
    6 "A" 3 "0012200" "001220006"
    6 "B" 3 "0012200" "001220006"
    6 "C" 3 "0012200" "001220006"
    6 "D" 3 "0012200" "001220006"
    6 "E" 3 "0012200" "001220006"
    6 "F" 3 "0012200" "001220006"
    6 "G" 3 "0012200" "001220006"
    6 "H" 3 "0012200" "001220006"
    6 "I" 3 "0012200" "001220006"
    7 "A" 3 "0012200" "001220007"
    7 "B" 3 "0012200" "001220007"
    7 "C" 3 "0012200" "001220007"
    7 "D" 3 "0012200" "001220007"
    7 "E" 3 "0012200" "001220007"
    7 "F" 3 "0012200" "001220007"
    7 "G" 3 "0012200" "001220007"
    7 "H" 3 "0012200" "001220007"
    7 "I" 3 "0012200" "001220007"
    9 "A" 3 "0012200" "001220009"
    9 "B" 3 "0012200" "001220009"
    9 "C" 3 "0012200" "001220009"
    9 "D" 3 "0012200" "001220009"
    9 "E" 3 "0012200" "001220009"
    9 "F" 3 "0012200" "001220009"
    9 "G" 3 "0012200" "001220009"
    9 "H" 3 "0012200" "001220009"
    9 "I" 3 "0012200" "001220009"
    1 "A" 3 "0012241" "001220003"
    1 "B" 3 "0012241" "001220003"
    1 "C" 3 "0012241" "001220003"
    1 "D" 3 "0012241" "001220003"
    1 "E" 3 "0012241" "001220003"
    1 "F" 3 "0012241" "001220003"
    1 "G" 3 "0012241" "001220003"
    1 "H" 3 "0012241" "001220003"
    1 "I" 3 "0012241" "001220003"
    1 "A" 3 "0012242" "001220008"
    1 "B" 3 "0012242" "001220008"
    1 "C" 3 "0012242" "001220008"
    1 "D" 3 "0012242" "001220008"
    1 "E" 3 "0012242" "001220008"
    1 "F" 3 "0012242" "001220008"
    1 "G" 3 "0012242" "001220008"
    1 "H" 3 "0012242" "001220008"
    1 "I" 3 "0012242" "001220008"
    3 "A" 3 "0012500" "001250003"
    3 "B" 3 "0012500" "001250003"
    3 "C" 1 "0012500" "001250003"
    3 "D" 1 "0012500" "001250003"
    3 "E" 3 "0012500" "001250003"
    3 "F" 3 "0012500" "001250003"
    3 "G" 3 "0012500" "001250003"
    3 "H" 3 "0012500" "001250003"
    3 "I" 3 "0012500" "001250003"
    8 "A" 3 "0012500" "001250008"
    8 "B" 3 "0012500" "001250008"
    8 "C" 3 "0012500" "001250008"
    8 "D" 3 "0012500" "001250008"
    8 "E" 3 "0012500" "001250008"
    8 "F" 3 "0012500" "001250008"
    8 "G" 3 "0012500" "001250008"
    8 "H" 3 "0012500" "001250008"
    8 "I" 3 "0012500" "001250008"
    1 "A" 3 "0012900" "001290001"
    1 "B" 3 "0012900" "001290001"
    1 "C" 3 "0012900" "001290001"
    1 "D" 3 "0012900" "001290001"
    1 "E" 3 "0012900" "001290001"
    1 "F" 3 "0012900" "001290001"
    1 "G" 3 "0012900" "001290001"
    1 "H" 3 "0012900" "001290001"
    1 "I" 3 "0012900" "001290001"
    2 "A" 3 "0012900" "001290002"
    2 "B" 3 "0012900" "001290002"
    2 "C" 3 "0012900" "001290002"
    2 "D" 3 "0012900" "001290002"
    2 "E" 3 "0012900" "001290002"
    2 "F" 3 "0012900" "001290002"
    2 "G" 3 "0012900" "001290002"
    2 "H" 3 "0012900" "001290002"
    2 "I" 3 "0012900" "001290002"
    1 "A" 3 "0020100" "002010001"
    end
    But what I get is like this
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double pid07 str10 pidlink str2 cd01type double cd01 str8 hhid07
    1 "001220001" "A" 3 "0012200"
    1 "001220001" ""  3 "0012200"
    1 "001220001" ""  3 "0012200"
    1 "001220001" ""  3 "0012200"
    1 "001220001" ""  3 "0012200"
    1 "001220001" ""  3 "0012200"
    1 "001220001" ""  3 "0012200"
    1 "001220001" ""  3 "0012200"
    1 "001220001" ""  3 "0012200"
    2 "001220002" "A" 3 "0012200"
    2 "001220002" ""  3 "0012200"
    2 "001220002" ""  3 "0012200"
    2 "001220002" ""  3 "0012200"
    2 "001220002" ""  3 "0012200"
    2 "001220002" ""  3 "0012200"
    2 "001220002" ""  3 "0012200"
    2 "001220002" ""  3 "0012200"
    2 "001220002" ""  3 "0012200"
    6 "001220006" "A" 3 "0012200"
    6 "001220006" ""  3 "0012200"
    6 "001220006" ""  3 "0012200"
    6 "001220006" ""  3 "0012200"
    6 "001220006" ""  3 "0012200"
    6 "001220006" ""  3 "0012200"
    6 "001220006" ""  3 "0012200"
    6 "001220006" ""  3 "0012200"
    6 "001220006" ""  3 "0012200"
    7 "001220007" "A" 3 "0012200"
    7 "001220007" ""  3 "0012200"
    7 "001220007" ""  3 "0012200"
    7 "001220007" ""  3 "0012200"
    7 "001220007" ""  3 "0012200"
    7 "001220007" ""  3 "0012200"
    7 "001220007" ""  3 "0012200"
    7 "001220007" ""  3 "0012200"
    7 "001220007" ""  3 "0012200"
    9 "001220009" "A" 3 "0012200"
    9 "001220009" ""  3 "0012200"
    9 "001220009" ""  3 "0012200"
    9 "001220009" ""  3 "0012200"
    9 "001220009" ""  3 "0012200"
    9 "001220009" ""  3 "0012200"
    9 "001220009" ""  3 "0012200"
    9 "001220009" ""  3 "0012200"
    9 "001220009" ""  3 "0012200"
    1 "001220003" "A" 3 "0012241"
    1 "001220003" ""  3 "0012241"
    1 "001220003" ""  3 "0012241"
    1 "001220003" ""  3 "0012241"
    1 "001220003" ""  3 "0012241"
    1 "001220003" ""  3 "0012241"
    1 "001220003" ""  3 "0012241"
    1 "001220003" ""  3 "0012241"
    1 "001220003" ""  3 "0012241"
    1 "001220008" "A" 3 "0012242"
    1 "001220008" ""  3 "0012242"
    1 "001220008" ""  3 "0012242"
    1 "001220008" ""  3 "0012242"
    1 "001220008" ""  3 "0012242"
    1 "001220008" ""  3 "0012242"
    1 "001220008" ""  3 "0012242"
    1 "001220008" ""  3 "0012242"
    1 "001220008" ""  3 "0012242"
    3 "001250003" "A" 3 "0012500"
    3 "001250003" ""  3 "0012500"
    3 "001250003" ""  1 "0012500"
    3 "001250003" "D" 1 "0012500"
    3 "001250003" "E" 3 "0012500"
    3 "001250003" ""  3 "0012500"
    3 "001250003" ""  3 "0012500"
    3 "001250003" ""  3 "0012500"
    3 "001250003" ""  3 "0012500"
    8 "001250008" "A" 3 "0012500"
    8 "001250008" ""  3 "0012500"
    8 "001250008" ""  3 "0012500"
    8 "001250008" ""  3 "0012500"
    8 "001250008" ""  3 "0012500"
    8 "001250008" ""  3 "0012500"
    8 "001250008" ""  3 "0012500"
    8 "001250008" ""  3 "0012500"
    8 "001250008" ""  3 "0012500"
    1 "001290001" "A" 3 "0012900"
    1 "001290001" ""  3 "0012900"
    1 "001290001" ""  3 "0012900"
    1 "001290001" ""  3 "0012900"
    1 "001290001" ""  3 "0012900"
    1 "001290001" ""  3 "0012900"
    1 "001290001" ""  3 "0012900"
    1 "001290001" ""  3 "0012900"
    1 "001290001" ""  3 "0012900"
    2 "001290002" "A" 3 "0012900"
    2 "001290002" ""  3 "0012900"
    2 "001290002" ""  3 "0012900"
    2 "001290002" ""  3 "0012900"
    2 "001290002" ""  3 "0012900"
    2 "001290002" ""  3 "0012900"
    2 "001290002" ""  3 "0012900"
    2 "001290002" ""  3 "0012900"
    2 "001290002" ""  3 "0012900"
    1 "002010001" "A" 3 "0020100"
    end
    What is the command to fill in cd01type which is the missing value? Thanks in advance!

  • #2
    Code:
    bys pidlink: replace cd01type = char(64+_n)

    Comment


    • #3
      Originally posted by Fei Wang View Post
      Code:
      bys pidlink: replace cd01type = char(64+_n)
      What is the meaning of 64+_n?
      Thank you!

      Comment


      • #4
        char(65), char(66), ..., represent strings "A", "B", ... "_n" is the line number. char(64+_n) means assigning char(65), char(66), ... to lines 1, 2, ...

        Comment


        • #5
          Let me add the following, not relevant to this example, but important for others attempting something similar who read this.

          The example data is unusual in that for each distinct value of pidlink, all the other variables except cd01type are identical in all the observations, and cd01type is to be replaced.

          This is important because the sorting done with
          Code:
          bys pidlink:
          will have ties broken randomly, as the output of help sort tells us. That doesn't make a difference with the data in post #1 and the code in post #2.

          In other cases, where the order is important - which observation becomes A and which becomes B and which becomes C, in this example,
          Code:
          sort pidlink, stable
          by pidlink: replace cd01type = char(64+_n)
          would ensure that within each distinct value of pidlink the observations remain in their original order.

          A new user of Stata might also wonder why sorting is required at all. The answer is that Stata won't take your word for it that the data is in a suitable order for by processing: either it has to be previously sorted, which Stata will remember, or it has to be sorted as part of the by command, as bysort does.

          Comment


          • #6
            Thank you William and Fei Wang!

            Comment

            Working...
            X