Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Aggregate individual-level data into household-level data

    Hi. I am working with a survey that gathers information on an individual level. The survey asks all the members in the household questions related to the location of the household (rural/urban), whether a household member has migrated, the size of the household, whether the household receives remittances from abroad, and the amount of remittances received. I would like to aggregate the individual-level data that I have from the survey into household-level data, in order to make summary statistics for the households. I have the household ID of each member. Since I am new to Stata, I am not quite sure how this can be done and I would really appreciate if someone can help me with that.

    here is how my data looks like:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str21 hhid float(area migrant) byte hhsize float(totalintremit amounttotalremit)
    "18120100041"      1 0  5 0     0
    "18120100051"      1 0  5 0     0
    "18120100051"      1 0  5 0     0
    "18120100051"      1 0  5 0     0
    "18120100061"      1 0 11 0     0
    "18120100061"      1 0 11 0     0
    "18120100061"      1 0 11 0     0
    "18120100061"      1 0 11 0     0
    "18120100061"      1 0 11 0     0
    "18120100061"      1 0 11 0     0
    "18120100061"      1 0 11 0     0
    "18120100081"      1 0  4 0     0
    "18120100101"      1 0  6 0     0
    "18120100101"      1 0  6 0     0
    "18120100101"      1 0  6 0     0
    "18120100101"      1 0  6 0     0
    "18120100111"      1 0  5 0     0
    "18120100111"      1 0  5 0     0
    "18120100121"      1 0  4 0     0
    "18120100131"      1 0  6 0     0
    "18120100141"      1 0  5 0     0
    "18120100141"      1 0  5 0     0
    "18120100151"      1 0  4 0     0
    "18120100171"      1 0  3 0     0
    "18120100171_1805" 1 0  4 0     0
    "18120100191"      1 0  7 0     0
    "18120100191"      1 0  7 0     0
    "18120100241"      1 0  6 0     0
    "18120100241"      1 0  6 0     0
    "18120100241"      1 0  6 0     0
    "18120100241_1807" 1 0  4 0     0
    "18120100241_1807" 1 0  4 0     0
    "18120100261"      1 0  4 0     0
    "18120100301"      1 0  6 0     0
    "18120100301"      1 0  6 0     0
    "18120100301"      1 0  6 0     0
    "18120100311"      1 0  7 0     0
    "18120100311"      1 0  7 0     0
    "18120100311"      1 0  7 0     0
    "18120100351"      1 0  5 0     0
    "18120100351"      1 0  5 0     0
    "18120100351"      1 0  5 0     0
    "18120100361"      1 0  4 0     0
    "18120100401"      1 0  5 0     0
    "18120100401"      1 0  5 0     0
    "18120100411"      1 0  6 0     0
    "18120100411"      1 0  6 0     0
    "18120100411"      1 0  6 0     0
    "18120100441"      1 0  5 0     0
    "18120100441"      1 0  5 0     0
    "18120100451"      1 0  5 0     0
    "18120100451"      1 0  5 0     0
    "18120100571"      1 0  5 0     0
    "18120100571"      1 0  5 0     0
    "18120100571"      1 0  5 0     0
    "18120100581"      1 0  4 0     0
    "18120100621"      1 0  7 0     0
    "18120100621"      1 0  7 0     0
    "18120100751"      1 0  7 0     0
    "18120100751"      1 0  7 0     0
    "18120100751"      1 0  7 0     0
    "18120100751"      1 0  7 0     0
    "18120100761"      1 0  5 0     0
    "18120100761"      1 0  5 0     0
    "18120100771"      1 0  4 0     0
    "18120100771"      1 0  4 0     0
    "18120100781"      1 0  5 0     0
    "18120100931"      1 0  6 0     0
    "18120100931"      1 0  6 0     0
    "18120100931"      1 0  6 0     0
    "18120101001"      1 0  5 0     0
    "18120101021"      1 0  4 0     0
    "18120101071"      1 0  4 0     0
    "18120101071"      1 0  4 0     0
    "18120101081"      1 0  5 0     0
    "18120101081"      1 0  5 0     0
    "18120101081"      1 0  5 0     0
    "18120101161"      1 0  5 0     0
    "18120101171_1803" 1 0  3 0     0
    "18120101221"      1 0  5 0     0
    "18120101221"      1 0  5 0     0
    "18120101221"      1 0  5 0     0
    "18120101241"      1 0  5 0     0
    "18120101241"      1 0  5 0     0
    "18120101241"      1 0  5 0     0
    "18120101251"      1 0  3 0     0
    "18120101261"      1 0  4 0     0
    "18120101291"      1 0  5 0     0
    "18120101291"      1 0  5 0     0
    "18120101291"      1 0  5 0     0
    "18120101311"      1 0  6 0     0
    "18120101311"      1 0  6 0     0
    "18120101311"      1 0  6 0     0
    "18120101421"      1 0  8 0     0
    "18120101421"      1 0  8 0     0
    "18120101421"      1 0  8 0     0
    "18120101431"      1 0  4 0     0
    "18120101611"      1 0  4 0     0
    "18120101621"      1 0  5 0     0
    "18120101651"      1 0  5 1 12000
    end
    Listed 100 out of 14856 observations


  • #2
    it appears that these variables do not vary within hhid; if that remains true in the rest of your data, one solution is to "tag" one person in each hhid and then ask for statistics "if tag"; here is an example:
    Code:
    egen byte tag=tag(hhid)
    ta hhsize if tag

    Comment


    • #3
      Thank you so much! I tried the code you suggested and it worked quite well. I just have another question, what if I want to generate the variables at the household-level, and not just have a summary statistics of them? thanks again

      Comment

      Working...
      X