Good Day,
I am using Stata 17. I am having trouble attempting to match potential twin pairs using IPUMS USA ACS micro data (dataex sample below). My dataset has 90 millions observations, the sample below does not include any obvious twin pairs but the entire sample will. I want to create a dummy variable called "twin" that takes on a value of one if two conditions are met:
1. serial (family identifier, type = double) is the same.
2. age (age of person, type = integer) is the same.
Essentially, if two or more observations have the same "serial" and "age" value I would like a "twin" dummy variable created with a value of one and otherwise assigned a value of zero. I have tried creating a twin variable and replacing using if conditions (1). The replace method does not work and gives all observations a value of 1 for the variable "twin". I have created a unique id "id" for each individual and was wondering if there is a stata command that would allow me to match using this "id" and conditions. I have been looking into the command vmatch but can not get it to do what I want.
(1) gen twin = 0
replace twin = 1 if serial == serial | age == age
Thank you,
Michael
I am using Stata 17. I am having trouble attempting to match potential twin pairs using IPUMS USA ACS micro data (dataex sample below). My dataset has 90 millions observations, the sample below does not include any obvious twin pairs but the entire sample will. I want to create a dummy variable called "twin" that takes on a value of one if two conditions are met:
1. serial (family identifier, type = double) is the same.
2. age (age of person, type = integer) is the same.
Essentially, if two or more observations have the same "serial" and "age" value I would like a "twin" dummy variable created with a value of one and otherwise assigned a value of zero. I have tried creating a twin variable and replacing using if conditions (1). The replace method does not work and gives all observations a value of 1 for the variable "twin". I have created a unique id "id" for each individual and was wondering if there is a stata command that would allow me to match using this "id" and conditions. I have been looking into the command vmatch but can not get it to do what I want.
(1) gen twin = 0
replace twin = 1 if serial == serial | age == age
year | serial | stateicp | pernum | sex | age | twin | id |
2017 | 4 | Alabama | 4 | Female | 7 | 1 | 1 |
2017 | 11 | Alabama | 5 | Female | 8 | 1 | 2 |
2017 | 11 | Alabama | 4 | Male | 15 | 1 | 3 |
2017 | 13 | Alabama | 4 | Male | 12 | 1 | 4 |
2017 | 13 | Alabama | 3 | Male | 13 | 1 | 5 |
2017 | 18 | Alabama | 3 | Female | 11 | 1 | 6 |
2017 | 21 | Alabama | 4 | Male | 13 | 1 | 7 |
2017 | 22 | Alabama | 5 | Male | 9 | 1 | 8 |
2017 | 22 | Alabama | 4 | Female | 12 | 1 | 9 |
2017 | 22 | Alabama | 3 | Female | 13 | 1 | 10 |
2017 | 23 | Alabama | 3 | Male | 7 | 1 | 11 |
2017 | 28 | Alabama | 4 | Female | 14 | 1 | 12 |
2017 | 28 | Alabama | 3 | Female | 15 | 1 | 13 |
2017 | 29 | Alabama | 4 | Female | 7 | 1 | 14 |
2017 | 29 | Alabama | 3 | Female | 10 | 1 | 15 |
2017 | 39 | Alabama | 3 | Male | 8 | 1 | 16 |
2017 | 41 | Alabama | 4 | Female | 13 | 1 | 17 |
2017 | 41 | Alabama | 3 | Male | 15 | 1 | 18 |
2017 | 46 | Alabama | 6 | Male | 8 | 1 | 19 |
2017 | 46 | Alabama | 5 | Male | 9 | 1 | 20 |
2017 | 46 | Alabama | 4 | Male | 14 | 1 | 21 |
2017 | 46 | Alabama | 3 | Male | 15 | 1 | 22 |
2017 | 52 | Alabama | 4 | Female | 11 | 1 | 23 |
2017 | 53 | Alabama | 3 | Female | 7 | 1 | 24 |
2017 | 55 | Alabama | 2 | Male | 10 | 1 | 25 |
Michael
Comment