Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Compressing multiple response of a group id into one single row by another group id

    Dear Stata Users,
    I am a novice in using Stata and the data I'm using look like this. hhid14 and pid14 are the household id and person id respectively, and the pk* are types of decisions made in the household and it is a multiple choice survey question. A is when the head of household decides pk* and B is the spouse who decides pk*. X and Y are seen as the household not needing a certain pk decision.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str8 hhid14 double pid14 str7(pka1 pka2 pkb) str5 pkc str4 pkd str5 pke str6 pkf str9 pkg str7 pkh str4(pki pkj) str6(pkk pkl) str5(pkm pkn pko) str6 pkp str4 pkq
    "0010600" 1 ""   ""  ""  ""  ""  ""  ""  ""  ""  ""  ""   ""   ""  ""  ""  ""  ""  "Y"
    "0010600" 1 ""   ""  ""  ""  ""  ""  ""  ""  ""  ""  ""   ""   ""  ""  ""  ""  "A" "" 
    "0010600" 1 ""   ""  ""  ""  ""  ""  ""  ""  ""  ""  ""   ""   ""  ""  ""  "A" ""  "" 
    "0010600" 1 ""   ""  ""  ""  ""  ""  ""  ""  ""  ""  ""   ""   ""  ""  "A" ""  ""  "" 
    "0010600" 1 ""   ""  ""  ""  ""  ""  ""  ""  ""  ""  ""   ""   ""  "X" ""  ""  ""  "" 
    "0010600" 1 ""   ""  ""  ""  ""  ""  ""  ""  ""  ""  ""   ""   "X" ""  ""  ""  ""  "" 
    "0010600" 1 ""   ""  ""  ""  ""  ""  ""  ""  ""  ""  ""   "AB" ""  ""  ""  ""  ""  "" 
    "0010600" 1 ""   ""  ""  ""  ""  ""  ""  ""  ""  ""  "AB" ""   ""  ""  ""  ""  ""  "" 
    "0010600" 1 ""   ""  ""  ""  ""  ""  ""  ""  ""  "A" ""   ""   ""  ""  ""  ""  ""  "" 
    "0010600" 1 ""   ""  ""  ""  ""  ""  ""  ""  "B" ""  ""   ""   ""  ""  ""  ""  ""  "" 
    "0010600" 1 ""   ""  ""  ""  ""  ""  ""  "B" ""  ""  ""   ""   ""  ""  ""  ""  ""  "" 
    "0010600" 1 ""   ""  ""  ""  ""  ""  "B" ""  ""  ""  ""   ""   ""  ""  ""  ""  ""  "" 
    "0010600" 1 ""   ""  ""  ""  ""  "B" ""  ""  ""  ""  ""   ""   ""  ""  ""  ""  ""  "" 
    "0010600" 1 ""   ""  ""  ""  "B" ""  ""  ""  ""  ""  ""   ""   ""  ""  ""  ""  ""  "" 
    "0010600" 1 ""   ""  ""  "B" ""  ""  ""  ""  ""  ""  ""   ""   ""  ""  ""  ""  ""  "" 
    "0010600" 1 ""   ""  "B" ""  ""  ""  ""  ""  ""  ""  ""   ""   ""  ""  ""  ""  ""  "" 
    "0010600" 1 ""   "B" ""  ""  ""  ""  ""  ""  ""  ""  ""   ""   ""  ""  ""  ""  ""  "" 
    "0010600" 1 "AB" ""  ""  ""  ""  ""  ""  ""  ""  ""  ""   ""   ""  ""  ""  ""  ""  "" 
    end
    I want to compress the observation of pid14 in each hhid14 and make my data to look like down below, how should I approach this?
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str8 hhid14 double pid14 str7(pka1 pka2 pkb) str5 pkc str4 pkd str5 pke str6 pkf str9 pkg str7 pkh str4(pki pkj) str6(pkk pkl) str5(pkm pkn pko) str6 pkp str4 pkq
    "0010600" 1 "AB"   "B"  "B"  "B"  "B"  "B"  "B"  "B"  "B"  "A"  "AB"   "AB"   "X"  "X"  "A"  "A"  "A"  "Y"
    end
    Thank you

  • #2
    Welcome to Statalist.

    The command collapse with either first or last non-missing value (firstnm, lastnm) should work:

    Code:
    collapse (firstnm) pk*, by(hhid14 pid14)
    Results:

    Code:
         +-------------------------------------------------------------------------------------------------------------------------------+
         |  hhid14   pid14   pka1   pka2   pkb   pkc   pkd   pke   pkf   pkg   pkh   pki   pkj   pkk   pkl   pkm   pkn   pko   pkp   pkq |
         |-------------------------------------------------------------------------------------------------------------------------------|
      1. | 0010600       1     AB      B     B     B     B     B     B     B     B     A    AB    AB     X     X     A     A     A     Y |
         +-------------------------------------------------------------------------------------------------------------------------------+

    Comment


    • #3
      Ken Chui
      Thank you for the answer, it works the way I want it to.

      Comment

      Working...
      X