Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Loop over households and individuals

    Hi,

    I want to create a variable that gives me the amount of children aged 17 and younger (children17) for each household.

    For that I have these vars:

    hhid -> household id
    pid_1 - pid_8 -> id of the individual in the household between 1 and 8 (so a person can have no. 1, 2,3...or 8 referring to a certain household id)
    age_1 - age_8 -> age of each individual (no.1 - no.8) in a household
    relrefp -> the relationship to the reference person of the hh: each of the children need relrefp==3 (daughter/son)

    I tried to create such a variable with a loop several ways but somehow I do not find the correct solution.

    forv i = 1(1)8 {
    2. gen children17=`i' if age_`i'<=17 & relrefp_`i'==3
    3. }

    This is probably still not that close to the solution, as stata tells me that "variable children17 already defined", though it´s not in the datasat before starting the loop.

    Perhaps someone with much more experience in stata loops can give me the right hints?

  • #2
    Like most things in Stata data management and analysis, this is much easier to do in long layout than in wide. So first -reshape- the data to long, and then it's a one-liner. You can go back to wide again if there is a really good reason to do so, but there aren't many things in Stata that are best done in wide layout, so think twice before doing that.

    Code:
    reshape long pid age relrefp, i(hhid) j(_j) string
    by hhid, sort: egen children17 = total(age <= 17 & relrefp == 3)
    
    //    AND IF THERE IS A COMPELLING REASON TO GO BACK TO WIDE DATA
    //   BUT PROBABLY BETTER TO SKIP THIS
    reshape wide
    In the future, when showing data examples, please use the -dataex- command to do so. If you are running version 15.1 or a fully updated version 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.



    When asking for help with code, always show example data. When showing example data, always use -dataex-.

    Comment


    • #3
      Clyde is right that reshaping long is usually more useful than keeping the data wide. And Clyde's answer is the best way to go if you're going to ultimately use the data in long format. If, for whatever reason, you need to keep the data wide a loop might make more sense because reshaping twice probably takes longer than looping.

      Your current loop fails because you're generating a variable in the loop. The first cycle through is fine because the variable does not yet exist but the second time Stata cycles through you're asking it to create a variable that it already created in the last iteration of the loop. So that fails. Even if you created the variable before the loop and used replace in your loop, your current code also would not actually count children under 17. It would just give you the index number of the last child under 17 encountered. Imagine an observation with 7 people in the household and only one child where the child's information is in pid_7, age_7, and relrefp_7. In that case your loop would say there were 7 children in the household, which is not the case.

      Try something like this if you need a loop for wide data.

      Code:
      gen children17=0
      forv i = 1/8 {
          replace children17=children17+1 if age_`i'<=17 & relrefp_`i'==3
      }

      Comment


      • #4
        Dear Clyde and Sarah,

        thank you for your very helpful advice!!

        I installed dataex and will try it for future posts here. I also gave the data format a second thought and tried the reshape command but without success. I am working with mi data where almost all variables are mi vars. Reshaping my very large dataset took quite long and in the end I always got an error message. The combination of hh and individual data and how I am already working with it seems to make wide format more suitable for me.

        The loop from you, Sarah, worked very well. I also used it for other age related variables and I was so glad, it finally worked! Thanks a lot!

        Comment


        • #5
          Well, I'm glad you got the job done with Sarah's loop.

          As you have discovered, -reshape- should not be used with -mi- data. Had you said it was -mi- data I would have pointed you, instead, to -mi reshape-, a separate command, that would have done the job in a way analogous to what was shown in #2.

          Comment


          • #6
            Hi Lydia,

            There are several posts here of people using household composition data (and based on your description, it may be the same dataset). See here, here, here, and here.

            If you need help using dataex, I created a Youtube video here

            Comment


            • #7
              Thank you Clyde, yes, I should have mentioned that it´s mi data and also that of course I tried mi reshape instead of reshape.

              Thank you for the information with the Links, David. Perhaps they are really using the same dataset, given the same structure of the family relation variable for example. I will have a look at your video before posting another question.

              Comment

              Working...
              X