Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using loops for combining multiple event dates and ages into one variable

    Dear all,

    I am using a dataset in which I have information on events (from 1972 - 2000) and information on the day of birth of the respondent. I created an age variable using: gen age=(td(1jan2017)-gebdat)/365.25. So age represents the age in year 2017.

    It is organized as follows:

    ID 1972 1973 1974 AGE
    1 1 2 4 44.000
    2 0 0 2 33.199
    3 1 2 2 22.677
    4 12 4 0 55.998

    So, person 1 participated in one event in 1972, in two event in 1973 etc
    However, I want to create new variables, in which the events correspond to age. So, I want to end up with:

    ID event_age12 event_age13 event_age14 event_age15
    1
    2
    3
    4

    So, if a person scores 2 at an event in 1973 and his/her DOB is between 1 jan 1961 and 31 dec 1961 (or age = 56), he/she should have 2 at variable event_age12

    I tried to generate variables using:

    gen age_12
    tab 1972
    replace age_12 = 1 if (1972==1)
    replace age_12 = 2 if (1972==2)
    replace age_12 = 0 if (age>=57) & (1972>=1)
    replace age_12 = 0 if (age<56) & (1972>=1)

    But, if I do it like this, I have to do the same thing for every age from 12 till 70 AND every event year.... So, therefore I tried to use loops.
    However, I cannot figure out how to combine age and event in one loop, without making event a dummy variable.

    For example I tried to do something like this:
    foreach eventyear of varlist 'eventyear' { generate ...

    (eventyear as cutpoints?)

    As you can see, I don't know how to solve this issue. Should I use a loop inside a loop?
    I would very much appreciate it if you could help me figure this out, every step in the right direction is welcome!

    Thanks a lot in advance!

    Best, Lara
    Last edited by Lara Rosa; 20 Dec 2016, 07:50.

  • #2
    Lara,

    If I understand your problem correctly, it looks like you do indeed need a loop within a loop: one loop to go through all of the years and one loop to go through all of the ages. I would suggest not using the age in 2017 (mainly because it's just confusing), but instead calculating the age at year based on the birth date. Here's a suggestion:

    Code:
    forvalues age=12/15 {
      gen event_age`age'=.
      forvalues yr=1972/1974 {
          replace event_age`age'=e`yr' if  int((td(1jan`yr')-gebdat)/365.25)==`age'
       }
    }
    This assumes that the variables for number of events in each year are called e1972, e1973, e1974, etc.

    Note that this calculates the age as January 1st of each year. You may find it necessary to adjust the if qualifier in the replace statement to match your requirements, e.g., use round() instead of int() or calculate the age as of the middle of the year.

    There may be a better way to do this that I am not thinking of, but hopefully this will work for starters.

    Regards,
    Joe

    Comment


    • #3
      Actually, if I understand what you want, you can do this without any loops at all.

      But first, there is a problem with your example data. (Aside from the fact that you didn't use -dataex-: please see FAQ #12 and use -dataex- for all data examples going forward.) Based on the ages you give in year 2017, your respondents are all not yet born in the years shown in your example! So I have modified your example data so that the people are actually alive in the years when they participated in events.

      This is a double -reshape- problem:

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input float(id events1972 events1973 events1974 age2017)
      1  1 2 4     54
      2  0 0 2 63.199
      3  1 2 2 52.677
      4 12 4 0 55.998
      end
      
      reshape long events, i(id) j(year)
      gen current_age = floor(age2017 - (2017-year))
      
      drop year
      reshape wide events, i(id) j(current_age)
      That said, whatever additional analysis of this data you plan is likely to be easier if you leave the data in long layout. So I recommend that you skip the -drop year- and -reshape wide ...- commands and just work with the data from there. But if you really do need the data in wide layout, that is how you would get it.

      Note also that in order to group the events by age, the age has to be converted to some discrete variable using round() or int() (as Joe Canner noted in his response), or using floor() [which is equivalent to int() for non-negative numbers].

      Comment


      • #4
        Thank you very much! The loop in a loop command worked! However, I have a new problem now that is probably easy to fix:

        The loop creates indeed new variables event_age12 event_age13 event_age14 etc etc, but the values that should be in event_age13 are now placed in event_age12. The same is true for values that should be in event_age14 that are now placed in event_age13. So, all values need to move with one to the right. I am trying to work with [N=1], but it isn't working. Any ideas?

        Suggestions would be very much appreciated!

        Best, Lara

        Comment


        • #5
          I think I fixed it. At least, the numbers are as expected! I used:

          forvalues age=12/15 { gen event_age`age'=. forvalues yr=1972/1974 { replace event_age`age'=e`yr' if int((td(1jan`yr')-gebdat)/365.25)==`age'-1 } }
          Thank you so much!

          Comment

          Working...
          X