Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Replacing missing values by ID

    Hi,

    Is there any way to replace missing values in "response" variable with non missing values in the same variable by ID ?

    In the table below you can notice some of the values in the "response" variable are missing by ID. I need replace the missing values in the "response" variable with the rest of the non-missing values in same variable (please not that non missing values are same within "response" variable by ID). Is there any way to modify the "response" variable as "Desired_response" vairable?
    ID response Desired_response
    1 2 2
    1 2 2
    1 . 2
    1 . 2
    2 1 1
    2 . 1
    2 1 1

    Thanks.


  • #2

    Abbas:
    technically speaking you can do what you're after with the following code.
    Code:
    . set obs 3
    number of observations (_N) was 0, now 3
    
    . g id=1
    
    . g response=2 in 1/2
    (1 missing value generated)
    
    . expand 2
    (3 observations created)
    
    . replace id=2 in 4/6
    (3 real changes made)
    
    . bysort id: replace response=response[_n-1] if response==.
    (2 real changes made)
    However (and this is the substantive part of the matter) the code above allows the so called last observed carried forward (LOCF) approach for replacing missing values.
    As you can read in the literature or web sites concerning missing values management (e.g.: http://www.missingdata.org.uk/), this is not the way to go.
    Hence, unless you're 100% sure that the response variable should report the same values per id, I would strongly warn you about this method.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      See also

      http://www.stata.com/support/faqs/da...issing-values/

      http://www.statalist.org/forums/foru...-interpolation

      Comment


      • #4
        In addition to Carlo's post :
        Code:
        bysort id: replace response=response[_n-1] if response==.
        Would not replace a missing value located at the first observation of id.
        This case is not shown in your example, but since we don't know how the data is structured, it could happen.
        To know whether you're facing this situation you could type

        Code:
        by id: count if response[1]==.
        If it returns a non null number, you should modify Carlo's code to :
        Code:
        bysort id (response): replace response=response[_n-1] if response==.
        Aside this detail, I share the Carlo's warning : run this code only you're sure that response should be invariant within id. Otherwise, do not make up missing data.

        Comment


        • #5
          Charlie;
          good catch! Thanks.
          Kind regards,
          Carlo
          (Stata 19.0)

          Comment


          • #6
            Hi,

            Thanks Charlie and Carlo for the help.


            Just a comment - Carlo's code is useful and very applicable if you sort your response variable by id.

            Comment


            • #7
              Carlo and Charlie give excellent advice. There is no negativity in underlining that if you just click on the links in #3 you will also see much longer discussions of all these points, including code solutions and pointers to the same.

              Comment


              • #8
                I have a somewhat similar problem

                Code:
                * Example generated by -dataex-. To install: ssc install dataex
                clear
                input int id_hh float time byte(b2_q3 b2_q4)
                101 2 2 9
                101 3 . .
                102 2 2 9
                102 3 . .
                104 3 . .
                104 2 1 9
                106 3 . .
                106 2 1 9
                108 2 2 3
                108 3 . .
                109 2 2 3
                109 3 . .
                110 3 . .
                110 2 2 3
                111 2 2 3
                111 3 . .
                112 2 2 3
                112 3 . .
                113 3 . .
                113 2 2 3
                114 3 . .
                114 2 1 9
                116 3 . .
                116 2 1 9
                117 3 . .
                117 2 1 9
                118 2 2 3
                118 3 . .
                119 3 . .
                119 2 2 9
                120 2 2 3
                120 3 . .
                121 3 . .
                121 2 2 3
                122 3 . .
                122 2 2 9
                124 3 . .
                124 2 2 9
                125 3 . .
                125 2 1 9
                127 3 . .
                127 2 2 3
                128 2 2 9
                128 3 . .
                129 2 2 9
                129 3 . .
                130 2 2 3
                130 3 . .
                131 2 2 3
                131 3 . .
                133 2 2 4
                133 3 . .
                134 2 2 3
                134 3 . .
                136 3 . .
                136 2 2 3
                137 2 2 9
                137 3 . .
                138 2 2 3
                138 3 . .
                139 3 . .
                139 2 2 3
                140 3 . .
                140 2 2 9
                141 3 . .
                141 2 2 9
                202 2 2 5
                202 3 . .
                203 3 . .
                203 2 2 9
                204 2 2 9
                204 3 . .
                205 2 2 9
                205 3 . .
                206 3 . .
                206 2 2 9
                208 3 . .
                208 2 2 3
                209 2 2 5
                209 3 . .
                210 3 . .
                210 2 2 9
                211 2 2 3
                211 3 . .
                212 3 . .
                212 2 1 3
                214 2 2 9
                214 3 . .
                216 2 2 9
                216 3 . .
                217 2 1 9
                217 3 . .
                218 3 . .
                218 2 1 9
                219 3 . .
                219 2 2 5
                220 3 . .
                220 2 2 9
                222 3 . .
                222 2 1 9
                end
                label values b2_q3 Religion
                label def Religion 1 "Hindu", modify
                label def Religion 2 "Muslim", modify
                label values b2_q4 Social_Group
                label def Social_Group 3 "OBCs", modify
                label def Social_Group 9 "Others", modify
                I used the following thread

                Code:
                gsort id_hh time
                quietly by id_hh : replace b2_q3 = b2_q3[_n-1] if b2_q3 >=.

                The above code works. I was wondering how can I loop this across several variables say b2_q3, b2_q4, ... etc. I used the following and it said variable b2_q3 already defined. What am I doing wrong?

                Code:
                foreach var of newlist b2_q3 b2_q4 b2_q5 {
                quietly by id_hh : replace `var'  = `var'[_n-1] if `var'>=.
                }

                Comment


                • #9
                  Whatever you want to replace must be an existing variable. So, foreach stops as soon as you claim that b2_q3 is a new variable name.


                  Code:
                    
                   foreach var in b2_q3 b2_q4 b2_q5 {


                  should be fine.

                  Comment


                  • #10
                    Thank you Nick for pointing it out. Thanks to you, the code works fine.

                    Comment

                    Working...
                    X