Hello Statalisters,
I'm trying to summarize some statistics by household (NUM_HOG). Specifically, I would like a summary variable of how many kids they have that are 0 (age_0) through 5 years (age_5) and then I would like a simple binary indicating if the family has at least 1 child under 5.
I created the following code to create the age_* variables and the under5 variable:
forval i = 0/5 {
bysort NUM_HOG: egen age_`i' = count(PPA03) if PPA03==`i'
replace age_`i' =
}
bys NUM_HOG: gen under5 = 0
replace under5=1 if age_0 !=. | age_1 !=. | age_2 !=. | age_3 !=. | age_4 !=. | age_5 !=.
This resulted in the following dataset:
As you can see for NUM_HOG==13682 the under5 variable is sometimes 0 and sometimes 1 based on that specific individual within the family, and with the age_* variables they are largely missing even if someone in their family is 1 year old, for example.
Question: I would like 1 observation per household with these summary statistics so that I can merge it with another portion of this national survey. As the data currently stands, if I collapse by under5 using mean or count, I'm going to inaccurately capture the number of children each household has. As I see it, I think I need to figure out a way to adjust/add to my code so that the age_* variables and under5 variables are all set to the same number per household (NUM_HOG), something like how the hh_ame variable is currently.
I reviewed this post, which was helpful, but couldn't quite figure out how to apply it to the question at hand. I hope this is sufficient information to answer my question, but of course please highlight if more clarification is needed.
Thank you in advance!
I'm trying to summarize some statistics by household (NUM_HOG). Specifically, I would like a summary variable of how many kids they have that are 0 (age_0) through 5 years (age_5) and then I would like a simple binary indicating if the family has at least 1 child under 5.
I created the following code to create the age_* variables and the under5 variable:
forval i = 0/5 {
bysort NUM_HOG: egen age_`i' = count(PPA03) if PPA03==`i'
replace age_`i' =
}
bys NUM_HOG: gen under5 = 0
replace under5=1 if age_0 !=. | age_1 !=. | age_2 !=. | age_3 !=. | age_4 !=. | age_5 !=.
This resulted in the following dataset:
Code:
input double(NUM_HOG PPA02 PPA03) float(ame hh_ame age_0 age_1 age_2 age_3 age_4 age_5 under5) 13680 1 71 .76 4.02 . . . . . . 0 13680 2 65 .65 4.02 . . . . . . 0 13680 1 28 1 4.02 . . . . . . 0 13680 1 16 .96 4.02 . . . . . . 0 13680 2 13 .65 4.02 . . . . . . 0 13681 1 42 .95 3.06 . . . . . . 0 13681 2 25 .74 3.06 . . . . . . 0 13681 1 7 .56 3.06 . . . . . . 0 13681 2 4 .44 3.06 . . . . 1 . 1 13681 2 2 .37 3.06 . . 1 . . . 1 13682 1 34 .95 2.98 . . . . . . 0 13682 2 24 .74 2.98 . . . . . . 0 13682 1 3 .37 2.98 . . . 1 . . 1 13682 2 1 .27 2.98 . 1 . . . . 1 13682 2 69 .65 2.98 . . . . . . 0 end
Question: I would like 1 observation per household with these summary statistics so that I can merge it with another portion of this national survey. As the data currently stands, if I collapse by under5 using mean or count, I'm going to inaccurately capture the number of children each household has. As I see it, I think I need to figure out a way to adjust/add to my code so that the age_* variables and under5 variables are all set to the same number per household (NUM_HOG), something like how the hh_ame variable is currently.
I reviewed this post, which was helpful, but couldn't quite figure out how to apply it to the question at hand. I hope this is sufficient information to answer my question, but of course please highlight if more clarification is needed.
Thank you in advance!
Comment