Dear Stata users,
I have a question about propensity score matching for a longitudinal datafile with a time-varying treatment variable and time-constant (for instance gender, background status) and time-varying matching variables (for instance age, but also a neighbourhood deprivation score that varies per year)
I have access to a long-format datafile (2005-2011) with yearly administrative data (residential, demographic, socioeconomic information) on almost 14.500 individuals. About 500 individuals are relocatees, forced to move out of their dwellings in highly deprived neighbourhoods due to urban restructurering policies. The rest of the individuals in the data are main tenants of addresses in the same city (and sometimes even same deprived neighbourhood) that were not subject of an urban renewal program. The 500 relocatees received substantial financial compensation for moving costs and a priority position on social housing waiting lists and got to choose from a wide range of social housing dwellings across the city.
I want to match these 500 relocatees to 500 comparable residents and investigate whether being subject to an urban renewal policy has been beneficial to relocatees (moving to a more affluent neighbourhood and increased socioeconomic opportunities) compared to their counterparts that were not forced to relocate as their dwellings were not being demolished (but could still voluntary move, although they did not get assistance and compensation for moving out of their dwellings).
I chose to match an individual based on its characteristics the year before treatment (so for person ID 1 in the table below for the year 2006) with an individual from the control group with similar characteristics in the same year. I followed this example by http://www.stata.com/statalist/archi.../msg00073.html for force an exact match:
probit treatment gender age deprivation_index couple kids, cluster(ID)
*** The predicted probability is calculated for the year before for the treatment for the treated group (in the example above this
*** is 2006, but it could also be 2005, 2007, 2008 or 2009)
*** and calculated for all years for the control group (so we can search for a match among all years of the control group)
predict double pscore if (esample) & (yearbeforetreatment=1 | treatment =0)
set seed 123456
gen u=uniform( )
sort u
** To force the exact match, I add the year to the p-scores:
gen pscore2=year+pscore
See an example of my data and estimated pscores in the table below (the values are fictional, as the data is highly confidential)
Then I used the psmatch2 command to make exact matches.
psmatch2 treatment, pscore(pscore2) noreplacement neighbor(1) common caliper(0.01)
This forces one treatment to be matched with a control person in the same year. What it also sometimes does, however, is to match two treated individuals to only one individual in the control group in different years. So treated individual A (treatment in 2007, searched for match on year before treatment 2006) and treated individual B (treatment in 2009, searched for match in 2008) are matched to individual C in control group in both 2006 and in 2008. This is due to the nature of the data: individual C also changes over time (due to a change in neighbourhood deprivation index, change in household status et cetera, so in 2008 the observation of this person is a good match for person B, while in 2006 it was a good match for person A).
We have enough control individuals (about 14.000) so we do not want that one individual in the control group is used twice. The ‘noreplacement’ option is of no use here, a comparison observation is not used as a match more than one time, but because I use person-year data, the same comparison individual is used as a match more than once… Does anybody know how to restrict Stata to only use one comparison individual in longitudinal data?
Furthermore, any other comments and suggestions regarding my matching procedure are very much welcome!
Thanks in advance,
Emily
I have a question about propensity score matching for a longitudinal datafile with a time-varying treatment variable and time-constant (for instance gender, background status) and time-varying matching variables (for instance age, but also a neighbourhood deprivation score that varies per year)
I have access to a long-format datafile (2005-2011) with yearly administrative data (residential, demographic, socioeconomic information) on almost 14.500 individuals. About 500 individuals are relocatees, forced to move out of their dwellings in highly deprived neighbourhoods due to urban restructurering policies. The rest of the individuals in the data are main tenants of addresses in the same city (and sometimes even same deprived neighbourhood) that were not subject of an urban renewal program. The 500 relocatees received substantial financial compensation for moving costs and a priority position on social housing waiting lists and got to choose from a wide range of social housing dwellings across the city.
I want to match these 500 relocatees to 500 comparable residents and investigate whether being subject to an urban renewal policy has been beneficial to relocatees (moving to a more affluent neighbourhood and increased socioeconomic opportunities) compared to their counterparts that were not forced to relocate as their dwellings were not being demolished (but could still voluntary move, although they did not get assistance and compensation for moving out of their dwellings).
I chose to match an individual based on its characteristics the year before treatment (so for person ID 1 in the table below for the year 2006) with an individual from the control group with similar characteristics in the same year. I followed this example by http://www.stata.com/statalist/archi.../msg00073.html for force an exact match:
probit treatment gender age deprivation_index couple kids, cluster(ID)
*** The predicted probability is calculated for the year before for the treatment for the treated group (in the example above this
*** is 2006, but it could also be 2005, 2007, 2008 or 2009)
*** and calculated for all years for the control group (so we can search for a match among all years of the control group)
predict double pscore if (esample) & (yearbeforetreatment=1 | treatment =0)
set seed 123456
gen u=uniform( )
sort u
** To force the exact match, I add the year to the p-scores:
gen pscore2=year+pscore
See an example of my data and estimated pscores in the table below (the values are fictional, as the data is highly confidential)
Year | Person ID | Adres ID | Treatment | Year before treatment | Gender | Age | Deprivation Score | Couple | Kids | pscore | pscore2 |
2005 | 1 | 78103 | 0 | 0 | M | 32 | 2.13 | 1 | 0 | . | |
2006 | 1 | 78103 | 0 | 1 | M | 33 | 2.17 | 1 | 0 | 0.013 | 2006.013 |
2007 | 1 | 66405 | 1 | 0 | M | 34 | 0.45 | 1 | 0 | . | |
2008 | 1 | 66405 | 0 | 0 | M | 35 | 0.48 | 1 | 0 | . | |
2009 | 1 | 66405 | 0 | 0 | M | 36 | 0.42 | 1 | 1 | ||
2010 | 1 | 53020 | 0 | 0 | M | 37 | 1.22 | 1 | 1 | ||
2011 | 1 | 53020 | 0 | 0 | M | 38 | 1.18 | 1 | 1 | ||
2005 | 2 | 11401 | 0 | 0 | F | 44 | 1.83 | 1 | 1 | 0.022 | 2005.022 |
2006 | 2 | 11401 | 0 | 0 | F | 45 | 1.87 | 1 | 1 | 0.021 | 2006.021 |
2007 | 2 | 11401 | 0 | 0 | F | 46 | 1.88 | 1 | 1 | 0.025 | 2007.025 |
2008 | 2 | 11401 | 0 | 0 | F | 47 | 1.84 | 1 | 1 | 0.026 | 2008.026 |
2009 | 2 | 11401 | 0 | 0 | F | 48 | 1.90 | 1 | 1 | 0.027 | 2009.027 |
2010 | 2 | 90622 | 0 | 0 | F | 49 | 0.98 | 1 | 1 | 0.023 | 2010.023 |
2011 | 2 | 90622 | 0 | 0 | F | 50 | 0.96 | 1 | 1 | 0.027 | 2011.027 |
psmatch2 treatment, pscore(pscore2) noreplacement neighbor(1) common caliper(0.01)
This forces one treatment to be matched with a control person in the same year. What it also sometimes does, however, is to match two treated individuals to only one individual in the control group in different years. So treated individual A (treatment in 2007, searched for match on year before treatment 2006) and treated individual B (treatment in 2009, searched for match in 2008) are matched to individual C in control group in both 2006 and in 2008. This is due to the nature of the data: individual C also changes over time (due to a change in neighbourhood deprivation index, change in household status et cetera, so in 2008 the observation of this person is a good match for person B, while in 2006 it was a good match for person A).
We have enough control individuals (about 14.000) so we do not want that one individual in the control group is used twice. The ‘noreplacement’ option is of no use here, a comparison observation is not used as a match more than one time, but because I use person-year data, the same comparison individual is used as a match more than once… Does anybody know how to restrict Stata to only use one comparison individual in longitudinal data?
Furthermore, any other comments and suggestions regarding my matching procedure are very much welcome!
Thanks in advance,
Emily
Comment