Compressing multiple response of a group id into one single row by another group id

Guest

Compressing multiple response of a group id into one single row by another group id

20 Jun 2022, 05:53

Dear Stata Users,
I am a novice in using Stata and the data I'm using look like this. hhid14 and pid14 are the household id and person id respectively, and the pk* are types of decisions made in the household and it is a multiple choice survey question. A is when the head of household decides pk* and B is the spouse who decides pk*. X and Y are seen as the household not needing a certain pk decision.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str8 hhid14 double pid14 str7(pka1 pka2 pkb) str5 pkc str4 pkd str5 pke str6 pkf str9 pkg str7 pkh str4(pki pkj) str6(pkk pkl) str5(pkm pkn pko) str6 pkp str4 pkq
"0010600" 1 ""   ""  ""  ""  ""  ""  ""  ""  ""  ""  ""   ""   ""  ""  ""  ""  ""  "Y"
"0010600" 1 ""   ""  ""  ""  ""  ""  ""  ""  ""  ""  ""   ""   ""  ""  ""  ""  "A" "" 
"0010600" 1 ""   ""  ""  ""  ""  ""  ""  ""  ""  ""  ""   ""   ""  ""  ""  "A" ""  "" 
"0010600" 1 ""   ""  ""  ""  ""  ""  ""  ""  ""  ""  ""   ""   ""  ""  "A" ""  ""  "" 
"0010600" 1 ""   ""  ""  ""  ""  ""  ""  ""  ""  ""  ""   ""   ""  "X" ""  ""  ""  "" 
"0010600" 1 ""   ""  ""  ""  ""  ""  ""  ""  ""  ""  ""   ""   "X" ""  ""  ""  ""  "" 
"0010600" 1 ""   ""  ""  ""  ""  ""  ""  ""  ""  ""  ""   "AB" ""  ""  ""  ""  ""  "" 
"0010600" 1 ""   ""  ""  ""  ""  ""  ""  ""  ""  ""  "AB" ""   ""  ""  ""  ""  ""  "" 
"0010600" 1 ""   ""  ""  ""  ""  ""  ""  ""  ""  "A" ""   ""   ""  ""  ""  ""  ""  "" 
"0010600" 1 ""   ""  ""  ""  ""  ""  ""  ""  "B" ""  ""   ""   ""  ""  ""  ""  ""  "" 
"0010600" 1 ""   ""  ""  ""  ""  ""  ""  "B" ""  ""  ""   ""   ""  ""  ""  ""  ""  "" 
"0010600" 1 ""   ""  ""  ""  ""  ""  "B" ""  ""  ""  ""   ""   ""  ""  ""  ""  ""  "" 
"0010600" 1 ""   ""  ""  ""  ""  "B" ""  ""  ""  ""  ""   ""   ""  ""  ""  ""  ""  "" 
"0010600" 1 ""   ""  ""  ""  "B" ""  ""  ""  ""  ""  ""   ""   ""  ""  ""  ""  ""  "" 
"0010600" 1 ""   ""  ""  "B" ""  ""  ""  ""  ""  ""  ""   ""   ""  ""  ""  ""  ""  "" 
"0010600" 1 ""   ""  "B" ""  ""  ""  ""  ""  ""  ""  ""   ""   ""  ""  ""  ""  ""  "" 
"0010600" 1 ""   "B" ""  ""  ""  ""  ""  ""  ""  ""  ""   ""   ""  ""  ""  ""  ""  "" 
"0010600" 1 "AB" ""  ""  ""  ""  ""  ""  ""  ""  ""  ""   ""   ""  ""  ""  ""  ""  "" 
end

I want to compress the observation of pid14 in each hhid14 and make my data to look like down below, how should I approach this?

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str8 hhid14 double pid14 str7(pka1 pka2 pkb) str5 pkc str4 pkd str5 pke str6 pkf str9 pkg str7 pkh str4(pki pkj) str6(pkk pkl) str5(pkm pkn pko) str6 pkp str4 pkq
"0010600" 1 "AB"   "B"  "B"  "B"  "B"  "B"  "B"  "B"  "B"  "A"  "AB"   "AB"   "X"  "X"  "A"  "A"  "A"  "Y"
end

Thank you

Tags: None

Ken Chui

Join Date: Aug 2014
Posts: 1063

20 Jun 2022, 06:08

Welcome to Statalist.

The command collapse with either first or last non-missing value (firstnm, lastnm) should work:

Code:

collapse (firstnm) pk*, by(hhid14 pid14)

Results:

Code:

     +-------------------------------------------------------------------------------------------------------------------------------+
     |  hhid14   pid14   pka1   pka2   pkb   pkc   pkd   pke   pkf   pkg   pkh   pki   pkj   pkk   pkl   pkm   pkn   pko   pkp   pkq |
     |-------------------------------------------------------------------------------------------------------------------------------|
  1. | 0010600       1     AB      B     B     B     B     B     B     B     B     A    AB    AB     X     X     A     A     A     Y |
     +-------------------------------------------------------------------------------------------------------------------------------+

Comment

Guest
#3

20 Jun 2022, 07:42

Ken Chui
Thank you for the answer, it works the way I want it to.
1 like
Comment

Announcement

Compressing multiple response of a group id into one single row by another group id

Comment

Comment