Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Reframing year- match

    Hello ,

    I had posted this question and the variable description here.

    https://www.statalist.org/forums/for...-year-matching

    I thought I might explain the difficulty I face a bit more clearly, so Im posting again.


    I created unique IDs for the mother and father (ID_mil ID_fil) . The index observation in this dataset (Dataset 1) is the eligible women, but I want to have a variable next to each PERSONID that shows the years since their parents passed.


    (Dataset 1)

    [CODE]. dataex ID_PERSON STATEID DISTID PSUID HHID2005 HHSPLITID2005 PERSONID PID2005 SPRO10 SPRO9 ID_mil ID_fil

    ----------------------- copy starting from the next line -----------------------
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double ID_PERSON int(STATEID DISTID PSUID) float(HHID2005 HHSPLITID2005 PERSONID) int(PID2005 SPRO10 SPRO9) double(ID_mil ID_fil)
    10201002009 1 2 1  2 0  9  9  2  1   102012002   102012001
    10201016004 1 2 1 16 0  4  4  2  1  1020116002  1020116001
    10201018005 1 2 1 18 0  5  5  2  1  1020118002  1020118001
    10201030102 1 2 1  3 0  2  2 75 69  1020130175  1020130169
    10201050102 1 2 1  5 0  2  3 75 75  1020150175  1020150175
    10201160104 1 2 1 16 0  4  4  2  1 10201160102 10201160101
    10201170102 1 2 1 17 0  2  1 75 75 10201170175 10201170175
    10201180105 1 2 1 18 0  5  5 11 75 10201180111 10201180175
    10201200104 1 2 1 20 0  4  4  2  1 10201200102 10201200101
    10202006004 1 2 2  6 0  4  4  2  1   102026002   102026001
    10202007004 1 2 2  7 0  4  4  2  1   102027002   102027001
    10202014002 1 2 2 14 0  2  2  6 75  1020214006  1020214075
    10202015002 1 2 2 15 0  2  2  7  6  1020215007  1020215006
    10202017002 1 2 2 17 0  2  2  7 75  1020217007  1020217075
    10202040102 1 2 2  4 0  2  2 75 75  1020240175  1020240175
    10202060102 1 2 2  6 0  2  4  9 75  1020260109  1020260175
    10202070104 1 2 2  7 0  4  4  2  1  1020270102  1020270101
    10202150102 1 2 2 15 0  2  2  7 75 10202150107 10202150175
    10202160102 1 2 2 16 0  2  2 75 75 10202160175 10202160175
    10202170102 1 2 2 17 0  2  2 75 75 10202170175 10202170175
    10202200102 1 2 2 20 0  2  2 75 75 10202200175 10202200175
    10203006009 1 2 3  6 0  9  9  2  1   102036002   102036001
    10203011002 1 2 3 11 0  2  2  6 75  1020311006  1020311075
    10203014004 1 2 3 14 0  4  4  2  1  1020314002  1020314001
    10203015003 1 2 3 15 0  3  3  6  .  1020315006    10203150
    
    
    end
    label values STATEID STATEID
    label def STATEID 1 "Jammu & Kashmir 01", modify
    label values SPRO10 SPRO10
    label def SPRO10 75 "Died 75", modify
    label values SPRO9 SPRO9
    label def SPRO9 69 "Other nonres 69", modify
    label def SPRO9 75 "Died 75", modify
    ------------------ copy up to and including the previous line ------------------



    I have Dataset 2 that has the expiry information of all the individuals interviewed in wave 1.



    Code:
    . dataex STATEID DISTID PSUID HHID2005 HHSPLITID2005 TK1 PERSONID TH1 TH3 PID2005
    
    ----------------------- copy starting from the next line -----------------------
    
    
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int(STATEID DISTID PSUID) float(HHID2005 HHSPLITID2005) int(TK1 PERSONID TH1 TH3 PID2005)
    1  2 1  5 0  1  1  1 4  1
    1  2 1  6 0 10 10 10 1 10
    1  2 1  9 0  1  1  1 4  1
    1  2 1 18 0  1  1  1 1  1
    1  2 2  5 0  1  1  1 4  1
    1  2 2  6 0  1  1  1 3  1
    1  2 2 12 0  1  1  1 2  1
    1  2 2 14 0  6  6  6 4  6
    1  2 2 15 0  6  6  6 6  6
    1  2 2 17 0  7  7  7 3  7
    1  2 3  4 0  2  2  2 4  2
    1  2 3  8 0  1  1  1 5  1
    1  2 3  8 0  2  2  2 4  2
    1  2 3  8 0  6  6  6 1  6
    1  2 3 13 0  1  1  1 6  1
    1  2 3 13 0  2  2  2 6  2
    1  2 3 19 0  1  1  1 4  1
    1  2 3 19 0  2  2  2 5  2
    1  2 3 20 0  1  1  1 6  1
    1  2 4  2 0  2  2  2 3  2
    1  2 4 11 0  1  1  1 5  1
    1  2 4 11 0  2  2  2 3  2
    1  2 4 13 0  1  1  1 5  1
    1  2 4 14 0  1  1  1 6  1
    1  2 4 17 0  2  2  2 5  2
    1  2 5  1 0  6  6  6 2  6
    1  2 5  3 0  3  3  3 4  3
    1  2 5  4 0  9  9  9 2  9
    1  2 5  5 0  9  9  9 3  9
    1  2 5  8 0  1  1  1 3  1
    1  2 5 10 0  2  2  2 2  2
    1  2 6  3 0 10 10 10 1 10
    1  2 7  5 0 28 28 28 5 28
    1  2 7 10 0  6  6  6 4  6
    1  2 7 12 0  7  7  7 7  7
    1  3 1  8 0  2  2  2 3  2
    1  3 1 12 0  1  1  1 2  1
    
    
    end
    label values STATEID STATEID
    label def STATEID 1 "Jammu & Kashmir 01", modify
    I merged using this command

    Code:
    use Dataset1
    merge m:1 STATEID DISTID PSUID HHID2005 HHSPLITID2005 PERSONID using Dataset2
    I couldn't use PID2005 to merge as I got the message STATEID DISTID PSUID HHID2005 HHSPLITID2005 PID2005 wasn't uniquely identifying observations in using data.

    But merging will show the years since death (TH3) of the index case which is the eligible women not her parents like I need.

    So how can I run the code such that a column comes up next to the PERSONID or PID2005 which shows TH3 (or any newly created variable years since the passing away of the parent (fil mil))

  • #2
    This may or may not be possible. It isn't possible with the example data, but it might be possible with a better example.

    The problem is this: there is no variable that appears in both data sets that identifies the same person. Your ID_mil and ID_fil variables are long integers, simpler to the ID_PERSON variable. But there is no variable remotely like that in Dataset 2. Instead, the key to identifying a person there seems to be PERSONID, which is just a one or two digit number. Now, perhaps that PERSONID variable should correspond to the PERSONID variable in the first data set. So one would need to use a multi-link chain here: from ID_mil in data set 1 to the corresponding ID_PERSON in data set 1, to the PERSONID of that ID_PERSON, and from there to the PERSONID variable in Dataset 2. But, in your example data none of the values of ID_mil appear as a value of ID_PERSON in Dataset 1, so the chain is broken before the first link can be built!

    If you can post back with a different example where the values of ID_mil and ID_fil also appear as values of ID_PERSON, it may be possible to proceed.

    Added: Assuming it can be done, it will be easiest to program this using frames. That requires Stata version 16 or later. It can be done without them, but with additional complications. If you are not using version 16 or later, be sure to say so when you post back so you will get a solution that you can actually run.
    Last edited by Clyde Schechter; 16 Jan 2023, 11:37.

    Comment


    • #3
      I see.

      I use Stata 13.

      I hope this example helps.

      Code:
      . dataex ID_PERSON ID_mil ID_fil STATEID DISTID PSUID HHID HHSPLITID PERSONID PID2005 SPRO10 SPRO9 ID_mil ID_fil
      
      ----------------------- copy starting from the next line -----------------------
      
      
      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input double(ID_PERSON ID_mil ID_fil) int(STATEID DISTID PSUID HHID HHSPLITID) float PERSONID int(PID2005 SPRO10 SPRO9)
      10201001001   102010010   102010010 1 2 1  1 0  1  1  . .
      10201001002   102010010   102010010 1 2 1  1 0  2  2  . .
      10201001003   102010010   102010010 1 2 1  1 0  3  3  . .
      10201001005   102010010   102010010 1 2 1  1 0  5  5  . .
      10201001006   102010010   102010010 1 2 1  1 0  6  6  . .
      10201002001   102010020   102010020 1 2 1  2 0  1  1  . .
      10201002003   102010020   102010020 1 2 1  2 0  3  3  . .
      10201002004   102010020 10201002001 1 2 1  2 0  4  4  . 1
      10201002005   102010020   102010020 1 2 1  2 0  5  5  . .
      10201002006   102010020   102010020 1 2 1  2 0  6  6  . .
      10201002007   102010020   102010020 1 2 1  2 0  7  7  . .
      10201002008   102010020   102010020 1 2 1  2 0  8  8  . .
      10201002009 10201002002 10201002001 1 2 1  2 0  9  9  2 1
      10201002010   102010020   102010020 1 2 1  2 0 10 10  . .
      10201002011   102010020   102010020 1 2 1  2 0 11 11  . .
      10201002012   102010020   102010020 1 2 1  2 0 12 12  . .
      10201002013   102010020 10201002001 1 2 1  2 0 13 13  . 1
      10201002014   102010020   102010020 1 2 1  2 0 14 14  . .
      10201002015   102010020   102010020 1 2 1  2 0 15 15  . .
      10201003001   102010030   102010030 1 2 1  3 0  1  1  . .
      10201003002   102010030   102010030 1 2 1  3 0  2  2  . .
      10201003003   102010030   102010030 1 2 1  3 0  3  3  . .
      10201003004   102010030   102010030 1 2 1  3 0  4  4  . .
      10201003005   102010030   102010030 1 2 1  3 0  5  5  . .
      10201003006   102010030   102010030 1 2 1  3 0  6  6  . .
      10201003007   102010030   102010030 1 2 1  3 0  7  7  . .
      10201004001   102010040   102010040 1 2 1  4 0  1  1  . .
      10201004002   102010040   102010040 1 2 1  4 0  2  2  . .
      10201004003   102010040   102010040 1 2 1  4 0  3  3  . .
      10201004004   102010040   102010040 1 2 1  4 0  4  4  . .
      10201004005   102010040   102010040 1 2 1  4 0  5  5  . .
      10201005002   102010050   102010050 1 2 1  5 0  2  2  . .
      10201005003 10201005075 10201005001 1 2 1  5 0  3  3 75 1
      10201005006   102010050   102010050 1 2 1  5 0  6  6  . .
      10201005007   102010050   102010050 1 2 1  5 0  7  7  . .
      10201005008   102010050   102010050 1 2 1  5 0  8  8  . .
      10201005009   102010050   102010050 1 2 1  5 0  9  9  . .
      10201005010   102010050   102010050 1 2 1  5 0 10 10  . .
      10201006001   102010060   102010060 1 2 1  6 0  1  1  . .
      10201006012   102010060   102010060 1 2 1  6 0 12 12  . .
      10201006013   102010060   102010060 1 2 1  6 0 13 13  . .
      10201006014   102010060 10201006001 1 2 1  6 0 14 14  . 1
      10201006015   102010060   102010060 1 2 1  6 0 15 15  . .
      10201007001   102010070   102010070 1 2 1  7 0  1  1  . .
      10201007002   102010070   102010070 1 2 1  7 0  2  2  . .
      10201007003   102010070   102010070 1 2 1  7 0  3  3  . .
      10201007004   102010070   102010070 1 2 1  7 0  4  4  . .
      10201007006   102010070   102010070 1 2 1  7 0  6  6  . .
      10201007007   102010070   102010070 1 2 1  7 0  7  7  . .
      10201007008   102010070   102010070 1 2 1  7 0  8  8  . .
      10201007009   102010070   102010070 1 2 1  7 0  9  9  . .
      10201008001   102010080   102010080 1 2 1  8 0  1  1  . .
      10201008002   102010080   102010080 1 2 1  8 0  2  2  . .
      10201008003   102010080   102010080 1 2 1  8 0  3  3  . .
      10201008004   102010080   102010080 1 2 1  8 0  4  4  . .
      10201008005   102010080   102010080 1 2 1  8 0  5  5  . .
      10201008006   102010080   102010080 1 2 1  8 0  6  6  . .
      10201009002   102010090   102010090 1 2 1  9 0  2  2  . .
      10201009003   102010090   102010090 1 2 1  9 0  3  3  . .
      10201009004   102010090   102010090 1 2 1  9 0  4  4  . .
      10201009005   102010090   102010090 1 2 1  9 0  5  5  . .
      10201009006   102010090   102010090 1 2 1  9 0  6  6  . .
      10201009007   102010090   102010090 1 2 1  9 0  7  7  . .
      10201009008   102010090   102010090 1 2 1  9 0  8  8  . .
      10201009009   102010090   102010090 1 2 1  9 0  9  9  . .
      10201010101   102010101   102010101 1 2 1 10 1  1  1  . .
      10201010102   102010101   102010101 1 2 1 10 1  2  2  . .
      10201010103   102010101   102010101 1 2 1 10 1  3  3  . .
      10201010104   102010101   102010101 1 2 1 10 1  4  4  . .
      10201010105   102010101   102010101 1 2 1 10 1  5  5  . .
      10201010106   102010101   102010101 1 2 1 10 1  6  6  . .
      10201010107   102010101   102010101 1 2 1 10 1  7  7  . .
      10201010108   102010101   102010101 1 2 1 10 1  8  8  . .
      10201010109   102010101   102010101 1 2 1 10 1  9  9  . .
      10201010110   102010101   102010101 1 2 1 10 1 10 10  . .
      10201010111   102010101   102010101 1 2 1 10 1 11 11  . .
      10201011001   102010110   102010110 1 2 1 11 0  1  1  . .
      10201012001   102010120   102010120 1 2 1 12 0  1  1  . .
      10201012002   102010120   102010120 1 2 1 12 0  2  2  . .
      10201012003   102010120   102010120 1 2 1 12 0  3  3  . .
      10201012004   102010120   102010120 1 2 1 12 0  4  4  . .
      10201013001   102010130   102010130 1 2 1 13 0  1  1  . .
      10201013002   102010130   102010130 1 2 1 13 0  2  2  . .
      10201013003   102010130   102010130 1 2 1 13 0  3  3  . .
      10201013004   102010130   102010130 1 2 1 13 0  4  4  . .
      10201013005   102010130   102010130 1 2 1 13 0  5  5  . .
      10201013006   102010130   102010130 1 2 1 13 0  6  6  . .
      10201013007   102010130   102010130 1 2 1 13 0  7  7  . .
      10201014001   102010140   102010140 1 2 1 14 0  1  1  . .
      10201014002   102010140   102010140 1 2 1 14 0  2  2  . .
      10201014003   102010140   102010140 1 2 1 14 0  3  3  . .
      10201014004   102010140   102010140 1 2 1 14 0  4  4  . .
      10201014005   102010140   102010140 1 2 1 14 0  5  5  . .
      10201014006   102010140   102010140 1 2 1 14 0  6  6  . .
      10201014007   102010140   102010140 1 2 1 14 0  7  7  . .
      10201016001   102010160   102010160 1 2 1 16 0  1  1  . .
      10201016002   102010160   102010160 1 2 1 16 0  2  2  . .
      10201016003   102010160   102010160 1 2 1 16 0  3  3  . .
      10201016004 10201016002 10201016001 1 2 1 16 0  4  4  2 1
      10201016005   102010160   102010160 1 2 1 16 0  5  5  . .
      end
      label values STATEID STATEID
      label def STATEID 1 "Jammu & Kashmir 01", modify
      label values SPRO10 SPRO10
      label def SPRO10 75 "Died 75", modify
      label values SPRO9 SPRO9
      ------------------ copy up to and including the previous line ------------------ Listed 100 out of 289749 observations Use the count() option to list more

      Comment


      • #4
        OK, this is better. Unfortunately, the people who do now appear both as an ID_PERSON and an ID_MIL or ID_FIL in dataset 1 do not, themselves, appear in dataset 2. Because of this, I am not sure the code shown below is correct, although I think it will work in fuller data sets that do include all the necessary instances of people.

        Note that there is an inconsistency in variable naming between the two datasets for HHID HHSPLITID in dataset 1, but HHID2005 and HHSPLITID2005 in dataset 2. This inconsistency did not exist in your original post. I have resolved this by keeping the versions ending in 2005 in both datasets, making the change explicitly in the code shown below.

        Note also that in my code I refer to these datasets as `dataset1' and `daataset2' because I am saving them as tempfiles with those names prior to the code shown. But you should just replace those references by the actual names of the real files.

        Code:
        use `dataset1', clear
        rename (HHID HHSPLITID) =2005
        //  CREATE AN ID_PERSON <-> PERSONID CROSSWALK
        preserve
        keep ID_PERSON PERSONID
        rename ID_PERSON link
        rename PERSONID parent_id
        isid link, sort
        tempfile crosswalk
        save `crosswalk'
        
        //  GET THE PERSONID FOR EACH PERSON'S MOTHER AND FATHER
        restore
        clonevar link = ID_mil
        merge m:1 link using `crosswalk', keep(master match) nogenerate
        rename parent_id linkmil
        replace link = ID_fil
        merge m:1 link using `crosswalk', keep(master match) nogenerate
        rename parent_id linkfil
        drop link
        
        //  SPLIT OBSERVATIONS INTO MOTHER & FATHER PAIRS
        reshape long link, i(ID_PERSON) j(mom_dad) string
        list STATEID DISTID PSUID HHID HHSPLITID link if !missing(link)
        
        
        //  NOW LINK ON HOUSEHOLD IDENTIFIER TO THE SECOND DATA SET
        local hhid STATEID DISTID PSUID HHID2005 HHSPLITID2005
        preserve
        use `dataset2', clear
        rename PERSONID link
        keep `hhid' link TH3
        isid `hhid' link, sort
        tempfile deaths
        save `deaths'
        restore
        merge m:1 `hhid' link using `deaths', keep(master match) nogenerate
        drop link
        reshape wide TH3, i(ID_PERSON) j(mom_dad) string

        Comment


        • #5
          I appreciate your time and help.

          Comment

          Working...
          X