Hi all,
I apologize if the title is unclear. I have the following data structure.
In this example, children (kidid) are nested within respondents (hhidpn). At this point, the only relevant variables are those of the respondent which are attached to every child row, and aggregate child level characteristics which I have already created. I now wanted to remove the children and bring this to a respondent level file. The problem is that in many cases, not all children are equally present in the data. This occurs due to death but also age cutoffs I am using. In the example, you can see child 1 has waves 8-10, whereas later children have 8-12. If I were to select only the first child to keep, I would miss out the extra data.
In the past, I have used collapse(firstnm) for this. Unfortunately, I have a large dataset, and the collapse takes a very long time to run on top of decoding/encoding to preserve missing data. Is the collapse function still my solution? Or is there a quicker/simpler way?
I apologize if the title is unclear. I have the following data structure.
Code:
+-----------------------------------------------------------------------------------------------------------+ | hhidpn kidid wave riwstat kabyea~k keduc_~k keduc_~4 valid_~d indica~r total_~r | |-----------------------------------------------------------------------------------------------------------| 166124. | 74892040 0748920101 8 4.nr,ali 1966 16 16 6 1 3 | 166125. | 74892040 0748920101 9 4.nr,ali 1966 16 16 6 1 3 | 166126. | 74892040 0748920101 10 1.resp,a 1966 16 16 6 1 3 | |-----------------------------------------------------------------------------------------------------------| 166127. | 74892040 0748920102 8 4.nr,ali 1968 12 12 6 1 5 | 166128. | 74892040 0748920102 9 4.nr,ali 1968 12 12 6 1 5 | 166129. | 74892040 0748920102 10 1.resp,a 1968 12 12 6 1 5 | 166130. | 74892040 0748920102 11 1.resp,a 1968 12 12 6 1 5 | 166131. | 74892040 0748920102 12 1.resp,a 1968 12 12 6 1 5 | |-----------------------------------------------------------------------------------------------------------| 166132. | 74892040 0748920103 8 4.nr,ali 1970 16 16 6 1 5 | 166133. | 74892040 0748920103 9 4.nr,ali 1970 16 16 6 1 5 | 166134. | 74892040 0748920103 10 1.resp,a 1970 16 16 6 1 5 | 166135. | 74892040 0748920103 11 1.resp,a 1970 16 16 6 1 5 | 166136. | 74892040 0748920103 12 1.resp,a 1970 16 16 6 1 5 | |-----------------------------------------------------------------------------------------------------------| 166137. | 74892040 0748920104 8 4.nr,ali 1971 11 11 6 1 5 | 166138. | 74892040 0748920104 9 4.nr,ali 1971 11 11 6 1 5 | 166139. | 74892040 0748920104 10 1.resp,a 1971 11 11 6 1 5 | 166140. | 74892040 0748920104 11 1.resp,a 1971 11 11 6 1 5 | 166141. | 74892040 0748920104 12 1.resp,a 1971 11 11 6 1 5 | |-----------------------------------------------------------------------------------------------------------| 166142. | 74892040 0748920105 8 4.nr,ali 1973 16 16 6 1 5 | 166143. | 74892040 0748920105 9 4.nr,ali 1973 16 16 6 1 5 | 166144. | 74892040 0748920105 10 1.resp,a 1973 16 16 6 1 5 | 166145. | 74892040 0748920105 11 1.resp,a 1973 16 16 6 1 5 | 166146. | 74892040 0748920105 12 1.resp,a 1973 16 16 6 1 5 | |-----------------------------------------------------------------------------------------------------------| 166147. | 74892040 0748920106 8 4.nr,ali 1976 9 9 6 1 5 | 166148. | 74892040 0748920106 9 4.nr,ali 1976 9 9 6 1 5 | 166149. | 74892040 0748920106 10 1.resp,a 1976 9 9 6 1 5 | 166150. | 74892040 0748920106 11 1.resp,a 1976 9 9 6 1 5 | 166151. | 74892040 0748920106 12 1.resp,a 1976 9 9 6 1 5 | +-----------------------------------------------------------------------------------------------------------+
In the past, I have used collapse(firstnm) for this. Unfortunately, I have a large dataset, and the collapse takes a very long time to run on top of decoding/encoding to preserve missing data. Is the collapse function still my solution? Or is there a quicker/simpler way?
Comment