Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • dtalink link records of a married woman and records of an unmarried girl as records of one female from childhood to adult

    Dear Statalists,


    I have two sets of records. One set of longitudinal records is adult married women. The other set of longitudinal records is unmarried girls under 18. I want to match the records of adult married women to the records of unmarried girls because I want to have longitudinal records of females for a sociological study. The possible matches are conditioned on the records of surnames, the time (year) the individuals appear in the datasets, and the records of birth years. Each possible match should meet the following conditions: 1) the married woman has the same birth year and surname as the girl, 2) the difference between the year when the girl last appears in the dataset and the year when the married woman first appears in the dataset should be no larger than three, 3) the year when the married woman first appears in the dataset should be larger than the year when the girl last appears in the dataset


    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int(year birth_year) str8 wife_id str14 surname
    1968 1940 "22222132" "smith"  
    1972 1954 "22222141" "cooper"    
    1967 1935 "22222227" "farrah"    
    1967 1919 "22222397" "roslenski"    
    1968 1941 "22222756" "weasley"    
    
    end
    label values birth_year BIRTHYEAR


    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int(year birth_year) str8 girl_id str14 surname
    1969 1953 "22222097" "campbell"
    1970 1961 "22222103" "chow"
    1968 1950 "22222105" "cooper"
    1968 1953 "22222106" "farrah"
    1966 1948 "22222117" "park"
    
    end
    label values birth_year BIRTHYEAR

    My program for dtalink is below:
    Code:
    use wife.dta,clear
    
    dtalink year 20 0 1 year 20 0 2 year 20 0 3 year 0 -50 4 ///
        using girl.dta, block(surname birth_year) cutoff(5) bestmatch
        
    drop if _score==.

    The problem is the possible match does not meet the third condition mentioned above. The wife data is the master data and the girl data is the using data after matching. However, the year of master is the same or earlier as or than the year of using in some matched pairs:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long _matchID byte _file float _id byte(_score _matchflag) int(year birth_year) str8 wife_id str14 surname str8 girl_id
      1 0  2678 60 1 1969 1953 "22222167" "brown" ""        
      1 1 24783 60 1 1969 1953 ""         "brown" "22222097"
     15 0  2005 60 1 1968 1950 "00276448" "jones" ""        
     15 1 24785 60 1 1968 1950 ""         "jones" "22222105"
     16 0    79 40 1 1970 1953 "00267653" "louis" ""        
     16 1 24786 40 1 1968 1953 ""         "louis" "22222106"
     20 0  5008 60 1 1967 1948 "00291191" "granthan" ""        
     20 1 24787 60 1 1966 1948 ""         "granthan" "22222117"
     62 0  3025 60 1 1976 1960 "00280414" "campbell" ""        
     62 1 24792 60 1 1977 1960 ""         "campbell" "22222138"


    How can I ensure the year of wife_id is later (or larger) than the year of girl_id in each matched pair?

    Thanks!
Working...
X