Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Replication of hh information

    hh q1_a q1_b q1_c q1_d q11ai q11aii q11bi childn child_age
    1 4 12
    1 4 5 2 2 1 6 0 1 38
    1 2 34
    1 3 28
    2 4 3 0 2 . . . 1 44
    2 2 27
    3 4 6 2 1 1 7 0 1 40
    3 2 23
    3 3 41
    4 2 3 1 1 . . . 1 56
    4 2 35
    4 3 39
    4 1 20
    Dears,

    I have a databse containing both households (hhs)and individual (child) information. Here above I tried to reproduce an example of the datasebase. What I would like to do is to replicate the hhs informaiton for each child belonging to the specific hh. The problem is that hh information are not always entered in the firts row (for instance in the case of hh 1 the information are in the second row). Could you please give me some hints on how to proceed?

    Thanks

    Federica

  • #2
    Try this:

    Code:
    foreach v of varlist q* {
        by hh (`v'), sort: assert `v' == `v'[1] | missing(`v')
        by hh (`v'): replace `v' = `v'[1]
    }
    The assert statement verifies that there are no inconsistent values of any of the q variables within any household. If that test is passed, the nonmissing value (which sorts to the top) is then imputed to all the other observations of the household. Note that this only works if all the variables involved are numeric. If you have a string variable to deal with, it is different:

    Code:
    foreach v of varlist my_string_variables {
        by hh (`v'), sort: assert `v' == `v'[_N] | missing(`v')
        by hh (`v'): replace `v' = `v'[_N]
    }
    This code for strings just replaces 1 by _N throughout. That is because for string variables, missing sorts first, whereas for numerics, missing sorts last.

    Do read the sections in the manuals about -by- and the behavior of 1, _n, and _N when the -by- prefix is used. This is bread-and-butter Stata that you will need to use all the time. Your effort learning it will be amply repaid.

    Comment


    • #3
      Clyde gives excellent advice as always. In addition, here's a one-stop shop tutorial:

      SJ-2-1 pr0004 . . . . . . . . . . Speaking Stata: How to move step by: step
      . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
      Q1/02 SJ 2(1):86--102 (no commands)
      explains the use of the by varlist : construct to tackle
      a variety of problems with group structure, ranging from
      simple calculations for each of several groups to more
      advanced manipulations that use the built-in _n and _N
      .pdf freely available and available for free at http://www.stata-journal.com/sjpdf.h...iclenum=pr0004

      Comment


      • #4
        Thank you both!!! I will try it and let you know.

        Best

        Federica

        Comment


        • #5
          Dear Clyde
          when i run your commnad i get the following message:
          Code:
          . foreach v of varlist q1 q1a-q1d q2 {
            2.     by q02 (`v'), sort: assert `v' == `v'[1] | missing(`v')
            3.     by q02 (`v'): replace `v' = `v'[1]
            4. }
          486 contradictions in 855 observations
          assertion is false
          I checked the data and for each hh with more than 1 child I only have information in one row, as from my example. do you have any suggestion to solve this problem?

          Thanks
          Federica

          Comment


          • #6
            I have never known Stata to be wrong about these things. My advice is to check your data again. This code will give you a listing of all the problem cases:

            Code:
            foreach v of varlist q1 q1a-q1d q2 {
                 by q02 (`v'), sort: egen byte problem_`v' = max(`v' != `v'[1] & !missing(`v'))
                 list q02 `v' if problem_`v', sepby(q02) abbrev(16)
             }
            By the way, I suggest you rename your variables to something that has mnemonic value. If you have to go back to this data a year from now, will you remember what q1a is? For that matter, even now, if you mistakenly did something with q1d that should have been done with q1c instead, would you be able to spot that error in your code?



            Comment


            • #7
              Dear Clyde, sorry I forgot to reply ...Thanks for your help it was very useful!! Federica

              Comment

              Working...
              X