Dear Statalist,
I am doing propensity score matching for treatment evaluation, using kernel-matching on pre-treatment information of the respondents. I am attempting to do s.th. which is called the "random program start procedure" and it means the following:
For every non-participant a random treatment start is assigned from the distribution of acutal program starts of the participants. This is done in order to discard observations where non-participants drop out before their hypothetical program would start in order to avoid bias.
I have the starting dates as well as the amount of months prior to treatment start for all participants (its time from unemployment entry until treatment). Now I am lost at how to draw hypothetical program starts for the non-participants. As far as I understand I need to predict them for all non-participants. But here I am lost: how can this be done? To me it sounds like I first, need to see what predicts starting dates for participants, and then use this information on the non-participants, which makes it sound like a machine learning problem.
I thought it would help to include a few original quotes from papers who did that. Unfortunately, I cannot get more information on how to do this (or I just havent found it yet):
"The first approach assigns each control unit a starting date by drawing in the discrete distribution of start dates as estimated from the (..) participants."*
"A problem concerns the group of nonparticipants. For this group important time varying variables like 'unemployment duration prior to the programme' are not defined. To make meaningful comparisons to those unemployed entering a programme, in the baseline estimate an approach suggested in Lechner (1999) is used: for each nonparticipant a hypothetical programme starting date from the sample distribution of starting dates is drawn. Persons with a simulated starting date later than their actual exit date from unemployment are excluded from the data set."**
"The latter assigns to each nontreated individual who does not receive the treatment of interest within a fixed time window of e.g. twelve months a hypothetical program start date that is simulated based on covariates observed at the start of the unemployment spell. A nontreated individual is then only used as comparison if he/she is still unemployed before the hypothetical starting date."***
In advance, thank you very much for your suggestions!
* https://www.jstor.org/stable/pdf/139...43676d3049f6ad
** https://www.alexandria.unisg.ch/1585...cal_Issues.pdf
*** https://www.wiwi.hu-berlin.de/de/pro...dp20120829.pdf
I am doing propensity score matching for treatment evaluation, using kernel-matching on pre-treatment information of the respondents. I am attempting to do s.th. which is called the "random program start procedure" and it means the following:
For every non-participant a random treatment start is assigned from the distribution of acutal program starts of the participants. This is done in order to discard observations where non-participants drop out before their hypothetical program would start in order to avoid bias.
I have the starting dates as well as the amount of months prior to treatment start for all participants (its time from unemployment entry until treatment). Now I am lost at how to draw hypothetical program starts for the non-participants. As far as I understand I need to predict them for all non-participants. But here I am lost: how can this be done? To me it sounds like I first, need to see what predicts starting dates for participants, and then use this information on the non-participants, which makes it sound like a machine learning problem.
I thought it would help to include a few original quotes from papers who did that. Unfortunately, I cannot get more information on how to do this (or I just havent found it yet):
"The first approach assigns each control unit a starting date by drawing in the discrete distribution of start dates as estimated from the (..) participants."*
"A problem concerns the group of nonparticipants. For this group important time varying variables like 'unemployment duration prior to the programme' are not defined. To make meaningful comparisons to those unemployed entering a programme, in the baseline estimate an approach suggested in Lechner (1999) is used: for each nonparticipant a hypothetical programme starting date from the sample distribution of starting dates is drawn. Persons with a simulated starting date later than their actual exit date from unemployment are excluded from the data set."**
"The latter assigns to each nontreated individual who does not receive the treatment of interest within a fixed time window of e.g. twelve months a hypothetical program start date that is simulated based on covariates observed at the start of the unemployment spell. A nontreated individual is then only used as comparison if he/she is still unemployed before the hypothetical starting date."***
In advance, thank you very much for your suggestions!
* https://www.jstor.org/stable/pdf/139...43676d3049f6ad
** https://www.alexandria.unisg.ch/1585...cal_Issues.pdf
*** https://www.wiwi.hu-berlin.de/de/pro...dp20120829.pdf
Comment