Dear Statalists,
I have two sets of records. One set of longitudinal records is adult married women. The other set of longitudinal records is unmarried girls under 18. I want to match the records of adult married women to the records of unmarried girls because I want to have longitudinal records of females for a sociological study. The possible matches are conditioned on the records of surnames, the time (year) the individuals appear in the datasets, and the records of birth years. Each possible match should meet the following conditions: 1) the married woman has the same birth year and surname as the girl, 2) the difference between the year when the girl last appears in the dataset and the year when the married woman first appears in the dataset should be no larger than three, 3) the year when the married woman first appears in the dataset should be larger than the year when the girl last appears in the dataset
My program for dtalink is below:
The problem is the possible match does not meet the third condition mentioned above. The wife data is the master data and the girl data is the using data after matching. However, the year of master is the same or earlier as or than the year of using in some matched pairs:
How can I ensure the year of wife_id is later (or larger) than the year of girl_id in each matched pair?
Thanks!
I have two sets of records. One set of longitudinal records is adult married women. The other set of longitudinal records is unmarried girls under 18. I want to match the records of adult married women to the records of unmarried girls because I want to have longitudinal records of females for a sociological study. The possible matches are conditioned on the records of surnames, the time (year) the individuals appear in the datasets, and the records of birth years. Each possible match should meet the following conditions: 1) the married woman has the same birth year and surname as the girl, 2) the difference between the year when the girl last appears in the dataset and the year when the married woman first appears in the dataset should be no larger than three, 3) the year when the married woman first appears in the dataset should be larger than the year when the girl last appears in the dataset
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input int(year birth_year) str8 wife_id str14 surname 1968 1940 "22222132" "smith" 1972 1954 "22222141" "cooper" 1967 1935 "22222227" "farrah" 1967 1919 "22222397" "roslenski" 1968 1941 "22222756" "weasley" end label values birth_year BIRTHYEAR
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input int(year birth_year) str8 girl_id str14 surname 1969 1953 "22222097" "campbell" 1970 1961 "22222103" "chow" 1968 1950 "22222105" "cooper" 1968 1953 "22222106" "farrah" 1966 1948 "22222117" "park" end label values birth_year BIRTHYEAR
My program for dtalink is below:
Code:
use wife.dta,clear dtalink year 20 0 1 year 20 0 2 year 20 0 3 year 0 -50 4 /// using girl.dta, block(surname birth_year) cutoff(5) bestmatch drop if _score==.
The problem is the possible match does not meet the third condition mentioned above. The wife data is the master data and the girl data is the using data after matching. However, the year of master is the same or earlier as or than the year of using in some matched pairs:
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input long _matchID byte _file float _id byte(_score _matchflag) int(year birth_year) str8 wife_id str14 surname str8 girl_id 1 0 2678 60 1 1969 1953 "22222167" "brown" "" 1 1 24783 60 1 1969 1953 "" "brown" "22222097" 15 0 2005 60 1 1968 1950 "00276448" "jones" "" 15 1 24785 60 1 1968 1950 "" "jones" "22222105" 16 0 79 40 1 1970 1953 "00267653" "louis" "" 16 1 24786 40 1 1968 1953 "" "louis" "22222106" 20 0 5008 60 1 1967 1948 "00291191" "granthan" "" 20 1 24787 60 1 1966 1948 "" "granthan" "22222117" 62 0 3025 60 1 1976 1960 "00280414" "campbell" "" 62 1 24792 60 1 1977 1960 "" "campbell" "22222138"
How can I ensure the year of wife_id is later (or larger) than the year of girl_id in each matched pair?
Thanks!