Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • generate binary variable per dyad

    Hi,
    I've got this example dataset where the dyads can be followed through census years. I would like to generate a variable "oldest" (0,1) to identify the oldest member of a dyad for each year they appear. The dataset comprises many dyads.
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long dyad int census long ego byte ego_age
    27266 4  617847  3
    27266 5  617847 13
    27266 5 1001460 33
    27266 4 1001460 23
    end
    label values census census
    label def census 4 "1891", modify
    label def census 5 "1901", modify
    I attempted something like this :
    Code:
    bysort census dyad (ego_age): egen oldest = max(ego_age)
    (and many other tests) but it's not working.

    Thanks for your help

  • #2
    Code:
    *CONFIRM AGE IS NEVER MISSING
    assert !missing(ego_age)
    *WANTED
     bys dyad census (ego_age): g wanted=_n==_N
    The rationale is that the largest values are sorted last within groups defined by dyad and census year.

    Comment


    • #3
      Code:
      by census dyad (ego_age), sort: gen byte oldest = (ego_age == ego_age[_N])
      By the way, if both members of the dyad are the same age, this code will flag them both as "oldest." If you do not want that behavior, you have to specify how you would break those ties.

      There are a couple of potential problems with the data that could sabotage this approach. If there are people for whom ego_age is missing, this code will fail. If you encounter this problem, post back and I will show slightly more complicated code that resolves this problem.

      Also, if the census is filled out on different dates in each year and a birthday falls between those dates, some people's age will not be perfectly consistent with the years elapsed, and if the two were originally tied or one year apart in age in one census, that could change, even flipping the order, across censuses. It isn't clear to me that this problem can be fixed in any case.

      Added: Crossed with #2. The solutions may produce slightly different results. If two members of a dyad are the same age, my solution designates both of them as "oldest," whereas the solution in #2 will pick one of them at random as the oldest. And the one it picks for that is not reproducible if the code is re-run.
      Last edited by Clyde Schechter; 03 Oct 2022, 15:21.

      Comment


      • #4

        Code:
        bysort census dyad (ego_age): gen oldest= (_n==1)
        But it assigns 1 to only one member with the oldest age. If more than one member have the same age (which you should expect), you will have to decide what to do. I imagine that you want to assign 1 to all members with the oldest age i.e. everyone gets value 1 if they are 80 year old and this is the oldest age. Then, using a new example with two members with equal ages Then the code could be:
        Code:
         bysort census dyad: egen max_age= max(ego_age) gen oldest=(max_age==ego_age)

        Comment


        • #5
          Hi,
          Thanks for your answers. In order
          Andrew, thanks for pointing out the missing cases. I have none (a couple 0s that I removed)

          @Clyde, 1) indeed, I will have to break ties as I want only one "1" per dyad. Twins will be a problem in that regard. 2) no missing age. 3) very good point. Age is not calculated from a date but rather taken as is in the census and reported variables (taken by census enumerators or declared by people themselves) are not 100% reliable. The worst case I foresee is that for a year ID1 will be considered the oldest and the youngest in the next census in the same dyad. Siblings or cousins with relatively the same age will generate that kind of behavior. As my aim is to compare characteristics of the oldest to the youngest, I'm not so sure how these cases will bias the analyses. But I would prefer picking one at random in case of ties.

          @Adrien, as mentioned above, I would prefer only one "1" per dyad with random selection in case of ties.

          Thanks again, really appreciated
          JS

          Comment


          • #6
            Given that you want to select one at random in the event of ties, you should code it in a way that makes the choice reproducible. So, you could modify the code in #2 as:
            Code:
            *CONFIRM AGE IS NEVER MISSING
            assert !missing(ego_age)
            *WANTED
            set seed 1234 // OR ANY OTHER INTEGER YOU LIKE
            gen double shuffle = runiform()
            bys dyad census (ego_age shuffle): g wanted=_n==_N
            drop shuffle
            This way, when you run the code repeatedly, the selection of the random tie-breaker will be the same each time. The number in the -set seed- command can be anything you like. Just set it once and don't change it.

            Comment


            • #7
              Thanks a lot, that is really helpful.
              JS

              Comment

              Working...
              X