Hello,
I have a three level panel data of individuals from 16 geographic clusters with each individual having 190 records. The data is from an impact evaluation project that I did not participate in. The project wanted to study the effectiveness of six different outreach programs on helping people with open warrants resolve their warrants using a web based service. After getting a list of all open warrants from the district court that the project took place, 16 zip code areas with the most number of open warrants were picked to receive treatment. 15 intervention packages consisting of one to four of the six programs were assigned to the geographic clusters and 14 out of the 16 clusters received multiple programs. To the best of my knowledge, the assignment was not totally random.
To make things more complicated, programs were not carried out all at once in clusters and star dates for programs differed by clusters. (example: Cluster 1 was exposed to program 1, program 2, and program 3 but with different start/finish dates. Start/finish dates for program 1 is different for cluster 1 and cluster 2).
Based on my limited knowledge, I thought that a difference in difference was appropriate but I have some questions that I want to ask.
1. I am not sure how to specify level 2 (individual) and level 3 effects (cluster). I know that for two level models, I can compare the two methods by running the Hausman specification test. But seeing that there are four possible specifications possible with two specification options (se and re) for two levels (cluster level and individual level), it seems like there must be another way to determine which specification is best. Also, how would I go about specifying level 2 as a fixed effect? I know that for level 3 I am going to add dummy variables using i.cluster.
I tried running the model with random effects:
where y is the dichotomous variable that records if the individual has accessed the web based service (once the access date is less or equal to the date of the observation, it takes on the value ). p1-p6 are the dummy variables for each program, t1-t6 are the time dummy variables for each program, DiD1-DID6 are the difference in difference estimators, cluster is factor variable for geographical cluster, and pid is id for each individual.
When I do this, the iterations go on and on with the message: "numerical derivatives are approximate flat or discontinuous region encountered".
After encountering this error and reading that nonlinear link functions violate common trend assumptions, I tried running it as a linear probability model.
This gives me results that seem identical to running it as a 2 level mixed linear probability model
2. This suggests that a geographical cluster has a similar impact for all individuals that belong to it- does this suggest that the specification method for level 3 is not that important?
3. This is more about how to express the model in written terms. Ignoring the multi-level design, this would be the expression for the linear probability model:

Where MA_it is the binary dependent variable, Date_ipt is date dummy, Treat_ipt is the treatment dummy, Treat_ipt*Date_ipt is the DiD estimator, and e_it is the error term. I am not quite sure how to express this with the multi-level design. Sorry in advance if some of this sounds obvious. I am new to a lot of this and I did try to read up on it as much as I could.
I have a three level panel data of individuals from 16 geographic clusters with each individual having 190 records. The data is from an impact evaluation project that I did not participate in. The project wanted to study the effectiveness of six different outreach programs on helping people with open warrants resolve their warrants using a web based service. After getting a list of all open warrants from the district court that the project took place, 16 zip code areas with the most number of open warrants were picked to receive treatment. 15 intervention packages consisting of one to four of the six programs were assigned to the geographic clusters and 14 out of the 16 clusters received multiple programs. To the best of my knowledge, the assignment was not totally random.
To make things more complicated, programs were not carried out all at once in clusters and star dates for programs differed by clusters. (example: Cluster 1 was exposed to program 1, program 2, and program 3 but with different start/finish dates. Start/finish dates for program 1 is different for cluster 1 and cluster 2).
Based on my limited knowledge, I thought that a difference in difference was appropriate but I have some questions that I want to ask.
1. I am not sure how to specify level 2 (individual) and level 3 effects (cluster). I know that for two level models, I can compare the two methods by running the Hausman specification test. But seeing that there are four possible specifications possible with two specification options (se and re) for two levels (cluster level and individual level), it seems like there must be another way to determine which specification is best. Also, how would I go about specifying level 2 as a fixed effect? I know that for level 3 I am going to add dummy variables using i.cluster.
I tried running the model with random effects:
Code:
xtset pid date xtmelogit y p1 p2 p3 p4 p5 p6 t1 t2 t3 t4 t5 t6 DiD1 DiD2 DiD3 DiD4 DiD5 DiD6 ||cluster: ||pid:
When I do this, the iterations go on and on with the message: "numerical derivatives are approximate flat or discontinuous region encountered".
After encountering this error and reading that nonlinear link functions violate common trend assumptions, I tried running it as a linear probability model.
Code:
xtmixed y p1 p2 p3 p4 p5 p6 t1 t2 t3 t4 t5 t6 DiD1 DiD2 DiD3 DiD4 DiD5 DiD6 ||cluster:||pid:, mle estimates store threelevel_re
Code:
xtmixed y p1 p2 p3 p4 p5 p6 t1 t2 t3 t4 t5 t6 DiD1 DiD2 DiD3 DiD4 DiD5 DiD6 ||pid:, mle estimates store twolevel_re estout threelevel_re twolevel_re -------------------------------------- threelevel~e twolevel_re b b -------------------------------------- y p1 .0070742 .0070742 p2 -.0082254 -.0082254 p3 .0033524 .0033524 p4 .0003054 .0003054 p5 -.0261967 -.0261967 p6 -.0207891 -.0207891 t1 .0001089 .0001089 t2 -.0008988 -.0008988 t3 .0053176 .0053176 t4 .0139366 .0139366 t5 .0069738 .0069738 t6 .0016222 .0016222 DiD1 .0296873 .0296873 DiD2 .0101513 .0101513 DiD3 -.0060855 -.0060855 DiD4 -.0130529 -.0130529 DiD5 -.0040293 -.0040293 DiD6 -.0063049 -.0063049 _cons .0129411 .0129411 -------------------------------------- lns1_1_1 _cons -17.65177 -1.925844 -------------------------------------- lns2_1_1 _cons -1.925839 -------------------------------------- lnsig_e _cons -3.085954 -3.085954 --------------------------------------
3. This is more about how to express the model in written terms. Ignoring the multi-level design, this would be the expression for the linear probability model:
Where MA_it is the binary dependent variable, Date_ipt is date dummy, Treat_ipt is the treatment dummy, Treat_ipt*Date_ipt is the DiD estimator, and e_it is the error term. I am not quite sure how to express this with the multi-level design. Sorry in advance if some of this sounds obvious. I am new to a lot of this and I did try to read up on it as much as I could.
Comment