Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generating new variables for partner?

    Dear stata community,

    I am fairly new to STATA and am trying to wrap my head around creating new variables from the content of the partner observation.
    I have a wide dataset where each observation is a single individual. let's say it looks like the following.

    ID PartnerID Var1 Var2
    1 5 “hi” “ho”
    2 3 “cat” “flower”
    3 2 “bird” “stone”
    4 . “Frog” “cycle”
    5 1 “Jupiter” “lollipop”


    Now I am attempting to generate each partner's variables as variable's upon the index-person's observation. Like this:

    ID PartnerID Var1 PartnerVar1 Var2 PartnerVar2
    1 5 “hi” “Jupiter” “ho” “lollipop“
    2 3 “cat” “bird” “flower” “stone”
    3 2 “bird” “cat” “stone” “flower”
    4 . “Frog” . “cycle” ""
    5 1 “Jupiter” “hi” “lollipop” “ho”


    The following syntax worked fine initially:
    gen PartnerVar1 = Var1[PartnerID]

    Yet, it is dependant on the ID-variable being a steady sequence without interruptions. If it breaks (e.g. 123 5678 10 15) there will be a mismatch.

    Do any of you have suggestions on how to match, not by row number but by the content of PartnerID? Preferably without using a Foreach/Forval loop as there are aprox. 70 variables and 1,000,000 observations.

    Kind regards,
    Joel

  • #2
    I would merge the dataset with a copy of itself, after suitable renaming.

    Code:
    clear 
    input ID PartnerID str8 (Var1 Var2)
    1 5 "hi" "ho"
    2 3 "cat" "flower"
    3 2 "bird" "stone"
    4 . "Frog" "cycle"
    5 1 "Jupiter" "lollipop" 
    end 
    
    * you start here 
    save thisdata   
    
    drop ID 
    rename (Var?) (P=)
    rename PartnerID ID 
    
    merge 1:1 ID using thisdata
    
    list

    Comment


    • #3
      Thank you Nick for taking the time to answer, I do appreciate it! I have also tried this solution, yet there is an issue of divorce. To explain this, I should propapbly also introduce ObsYear as the year of said observation. To return to the former example, it would look like this:

      ID PartnerID ObsYear Var1 Var2
      1 5 2010 "hi" "ho"
      2 3 2014 "cat" "flower"
      3 2 2012 "bird" stone"
      4 . 2009 "frog" "cycle"
      5 1 2013 "jupiter" "lollipop"

      Now let's say that ID[1] gets married to ID[5] in 2010. The following year ID[1] is divorced and is remarried to ID[2] in 2013. This would cause both ID[5] and ID[2] to have PartnerID ==1 at the point of observation (ObsYear). As is illustrated below:

      ID PartnerID ObsYear Var1 Var2
      1 5 2010 "hi" "ho"
      2 1 2014 "cat" "flower"
      3 2 2012 "bird" stone"
      4 . 2009 "frog" "cycle"
      5 1 2013 "jupiter" "lollipop"

      Were I to merge following PartnerID (renamed as ID), there will be a duplicate of 1:s which interferes with the mergeing of a wide dataset.

      Comment


      • #4
        I guess you need to try a merge using year as well as identifier.

        Comment


        • #5
          This is an excellent suggestion, thank you Nick! A problem is however, that the years do not necessarily match. The year variable doesn't show the point of partnership initiation, only of observation. For example ID[1] and ID[5] are partners in the example dataset above. Yet for ID[1], year=2010 and for ID[5], year=2013 as they were registered/observed at different points in time. Thus they will not align in the merge.

          Comment


          • #6
            That sounds like a limitation of your dataset for what you want to do.

            Comment


            • #7
              Could I perhaps create some kind of familyID for each partnership and then merge from that?

              Comment


              • #8
                I don't know. I'd suggest creating a better real or realistic data example using dataex -- as explained in https://www.statalist.org/forums/help#stata and exemplified in #2 -- as diminishing returns set in quickly when a discussion depends on reading word descriptions that get longer in the pursuit of understanding set-ups that turn out to be more complicated than first stated.

                Alternatively, if your initial statement that each individual occurs just once in the dataset is still true, then you have no information on changing properties and it's a 1:m or m:1 merge, depending which way you do it.

                Comment

                Working...
                X