Propensity Matching: Random program start procedure

Guest
#1

Propensity Matching: Random program start procedure

14 Jul 2017, 02:01

Dear Statalist,

I am doing propensity score matching for treatment evaluation, using kernel-matching on pre-treatment information of the respondents. I am attempting to do s.th. which is called the "random program start procedure" and it means the following:
For every non-participant a random treatment start is assigned from the distribution of acutal program starts of the participants. This is done in order to discard observations where non-participants drop out before their hypothetical program would start in order to avoid bias.

I have the starting dates as well as the amount of months prior to treatment start for all participants (its time from unemployment entry until treatment). Now I am lost at how to draw hypothetical program starts for the non-participants. As far as I understand I need to predict them for all non-participants. But here I am lost: how can this be done? To me it sounds like I first, need to see what predicts starting dates for participants, and then use this information on the non-participants, which makes it sound like a machine learning problem.

I thought it would help to include a few original quotes from papers who did that. Unfortunately, I cannot get more information on how to do this (or I just havent found it yet):

"The first approach assigns each control unit a starting date by drawing in the discrete distribution of start dates as estimated from the (..) participants."*

"A problem concerns the group of nonparticipants. For this group important time varying variables like 'unemployment duration prior to the programme' are not defined. To make meaningful comparisons to those unemployed entering a programme, in the baseline estimate an approach suggested in Lechner (1999) is used: for each nonparticipant a hypothetical programme starting date from the sample distribution of starting dates is drawn. Persons with a simulated starting date later than their actual exit date from unemployment are excluded from the data set."**

"The latter assigns to each nontreated individual who does not receive the treatment of interest within a fixed time window of e.g. twelve months a hypothetical program start date that is simulated based on covariates observed at the start of the unemployment spell. A nontreated individual is then only used as comparison if he/she is still unemployed before the hypothetical starting date."***

In advance, thank you very much for your suggestions!

* https://www.jstor.org/stable/pdf/139...43676d3049f6ad
** https://www.alexandria.unisg.ch/1585...cal_Issues.pdf
*** https://www.wiwi.hu-berlin.de/de/pro...dp20120829.pdf

Last edited by sladmin; 06 Feb 2018, 09:36. Reason: anonymize user
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30119
#2

14 Jul 2017, 08:55

So, before you even begin the propensity score matching, you need to impute start dates.

The first step is to identify variables that are predictive of program start dates among the treated. While this could be a machine learning problem, you probably can do this adequately with a simpler approach such as a regression model of some kind fitted to the data among the treated. Then you could use the regression coefficients derived from that regression, plus an appropriate error term sampled from the distribution family of the regression model (e.g. a normal distribution with s.d. = e(rmse) if the regression was OLS linear). Then, to assure that you get imputed dates that match those in the distribution of the treated, you could -join- (or -rangejoin-) these results with a data set that consists solely of the observed program start dates among the treated, and select the one that is the nearest match to your predicted date.

You need to discuss the appropriate covariates and appropriate regression model to use for this with an expert in your own area: it's not a statistical question, it's a scientific one. Once you have resolved that, if you want advice about how to code this, post back--be sure to include example data using -dataex-.
Comment

Guest

31 Jul 2017, 05:11

Dear Professor Schechter,

thank you very much for your response! I have been working on the covariates and decided which ones should be included and the model so far. The appropriate model appears to be a linear model.

Now I think the next step is to regress the starting dates of the treated on the set of covariates, then probably save those coefficients.

I would like to understand better how -join- would work in this context? I would imagine somewhat similar to what is described in the literature: As far as I understand, the coefficients and the error term are then used to predict the new starting dates for the non-treated.

I have started and will now include an example of my data (the original has very restricted access, therefore this one has the same structure, but the numbers are made up) as well as my first code.

Treat indicates that someone is in the treatment (=1) or control (=0) group.
Treat_d indicates the duration from entry into study until treatment begin for the treated (in months).
Leave_d indicated the duration from entry into study until leaving the study without beging treated (in months).
3 Covariates, measured before entry into treatment. (The actual study then also includes metric variables)

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte(id treat treat_d leave_d cov1 cov2 cov3)
 1 0  .  2 0 0 1
 2 0  .  5 0 1 0
 3 0  .  9 0 1 0
 4 0  .  8 1 1 0
 5 0  .  9 0 0 0
 6 0  .  1 0 0 1
 7 0  .  9 0 0 1
 8 0  . 10 0 0 1
 9 1  8  . 0 0 0
10 1  8  . 0 0 0
11 0  .  3 1 1 0
12 0  .  2 1 0 1
13 0  . 12 1 0 1
14 0  . 12 0 0 1
15 1  3  . 0 0 1
16 0  .  3 0 0 0
17 1  5  . 0 1 0
18 1  2  . 0 1 1
19 1  1  . 0 1 1
20 1  8  . 1 0 1
21 0  .  7 0 0 1
22 0  .  2 0 0 1
23 0  . 12 0 0 1
24 0  . 13 0 0 1
25 0  . 15 0 0 1
26 0  . 16 0 1 1
27 0  . 18 1 0 1
28 0  . 13 1 0 1
29 0  .  2 1 0 1
30 0  .  7 0 0 1
31 0  .  4 0 0 1
32 0  .  3 1 1 1
33 0  .  8 0 1 1
34 0  . 11 0 1 0
35 1 11  . 0 0 1
36 1  3  . 1 0 1
37 1  5  . 0 0 1
38 0  .  4 0 0 1
39 0  .  1 0 0 1
40 0  . 19 0 0 1
41 0  .  7 1 1 0
42 0  .  3 0 0 0
43 0  . 10 0 0 0
44 0  . 19 0 0 1
45 0  . 16 0 0 0
46 1 12  . 0 1 0
47 0  . 17 0 0 1
48 0  .  2 1 1 1
49 0  . 11 0 1 1
50 0  . 38 0 1 1
51 0  . 17 0 0 1
52 0  . 16 1 0 1
53 1  1 24 0 0 1
end

Then I started with:

HTML Code:

 keep if treat == 1
(41 observations deleted)

. reg treat_d cov1 cov2 cov3

      Source |       SS           df       MS      Number of obs   =        12
-------------+----------------------------------   F(3, 8)         =      1.41
       Model |  54.2738095         3  18.0912698   Prob > F        =    0.3093
    Residual |  102.642857         8  12.8303571   R-squared       =    0.3459
-------------+----------------------------------   Adj R-squared   =    0.1006
       Total |  156.916667        11  14.2651515   Root MSE        =    3.5819

------------------------------------------------------------------------------
     treat_d |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        cov1 |   1.071429   3.027299     0.35   0.733    -5.909536    8.052393
        cov2 |  -1.785714   2.344936    -0.76   0.468    -7.193146    3.621718
        cov3 |  -4.714286   2.344936    -2.01   0.079    -10.12172    .6931462
       _cons |   9.142857   2.140624     4.27   0.003      4.20657    14.07914
------------------------------------------------------------------------------

Announcement

Propensity Matching: Random program start procedure

Comment

Comment