Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • fe vs se in three level nested data (xtreg)

    Hello,

    I have a three level panel data of individuals from 16 geographic clusters with each individual having 190 records. The data is from an impact evaluation project that I did not participate in. The project wanted to study the effectiveness of six different outreach programs on helping people with open warrants resolve their warrants using a web based service. After getting a list of all open warrants from the district court that the project took place, 16 zip code areas with the most number of open warrants were picked to receive treatment. 15 intervention packages consisting of one to four of the six programs were assigned to the geographic clusters and 14 out of the 16 clusters received multiple programs. To the best of my knowledge, the assignment was not totally random.

    To make things more complicated, programs were not carried out all at once in clusters and star dates for programs differed by clusters. (example: Cluster 1 was exposed to program 1, program 2, and program 3 but with different start/finish dates. Start/finish dates for program 1 is different for cluster 1 and cluster 2).

    Based on my limited knowledge, I thought that a difference in difference was appropriate but I have some questions that I want to ask.

    1. I am not sure how to specify level 2 (individual) and level 3 effects (cluster). I know that for two level models, I can compare the two methods by running the Hausman specification test. But seeing that there are four possible specifications possible with two specification options (se and re) for two levels (cluster level and individual level), it seems like there must be another way to determine which specification is best. Also, how would I go about specifying level 2 as a fixed effect? I know that for level 3 I am going to add dummy variables using i.cluster.

    I tried running the model with random effects:
    Code:
    xtset pid date
    xtmelogit y p1 p2 p3 p4 p5 p6 t1 t2 t3 t4 t5 t6 DiD1 DiD2 DiD3 DiD4 DiD5 DiD6 ||cluster: ||pid:
    where y is the dichotomous variable that records if the individual has accessed the web based service (once the access date is less or equal to the date of the observation, it takes on the value ). p1-p6 are the dummy variables for each program, t1-t6 are the time dummy variables for each program, DiD1-DID6 are the difference in difference estimators, cluster is factor variable for geographical cluster, and pid is id for each individual.

    When I do this, the iterations go on and on with the message: "numerical derivatives are approximate flat or discontinuous region encountered".

    After encountering this error and reading that nonlinear link functions violate common trend assumptions, I tried running it as a linear probability model.
    Code:
    xtmixed y p1 p2 p3 p4 p5 p6 t1 t2 t3 t4 t5 t6 DiD1 DiD2 DiD3 DiD4 DiD5 DiD6 ||cluster:||pid:, mle
    estimates store threelevel_re
    This gives me results that seem identical to running it as a 2 level mixed linear probability model
    Code:
    xtmixed y p1 p2 p3 p4 p5 p6 t1 t2 t3 t4 t5 t6 DiD1 DiD2 DiD3 DiD4 DiD5 DiD6 ||pid:, mle
    estimates store twolevel_re
    estout threelevel_re twolevel_re
    
    --------------------------------------
                 threelevel~e  twolevel_re
                            b            b
    --------------------------------------
    y                                    
    p1               .0070742     .0070742
    p2              -.0082254    -.0082254
    p3               .0033524     .0033524
    p4               .0003054     .0003054
    p5              -.0261967    -.0261967
    p6              -.0207891    -.0207891
    t1               .0001089     .0001089
    t2              -.0008988    -.0008988
    t3               .0053176     .0053176
    t4               .0139366     .0139366
    t5               .0069738     .0069738
    t6               .0016222     .0016222
    DiD1             .0296873     .0296873
    DiD2             .0101513     .0101513
    DiD3            -.0060855    -.0060855
    DiD4            -.0130529    -.0130529
    DiD5            -.0040293    -.0040293
    DiD6            -.0063049    -.0063049
    _cons            .0129411     .0129411
    --------------------------------------
    lns1_1_1                              
    _cons           -17.65177    -1.925844
    --------------------------------------
    lns2_1_1                              
    _cons           -1.925839            
    --------------------------------------
    lnsig_e                              
    _cons           -3.085954    -3.085954
    --------------------------------------
    2. This suggests that a geographical cluster has a similar impact for all individuals that belong to it- does this suggest that the specification method for level 3 is not that important?

    3. This is more about how to express the model in written terms. Ignoring the multi-level design, this would be the expression for the linear probability model:

    Click image for larger version

Name:	Equation.PNG
Views:	1
Size:	13.0 KB
ID:	1385979

    Where MA_it is the binary dependent variable, Date_ipt is date dummy, Treat_ipt is the treatment dummy, Treat_ipt*Date_ipt is the DiD estimator, and e_it is the error term. I am not quite sure how to express this with the multi-level design. Sorry in advance if some of this sounds obvious. I am new to a lot of this and I did try to read up on it as much as I could.
    Last edited by Simmon Kim; 28 Apr 2017, 15:35.

  • #2
    You didn't get a quick answer. You'll increase your chances of a quick answer if you follow the FAQ on asking questions - include Stata code in code delimiters, Stata output, and sample data using dataex. Also try to simplify what you present. You have a long complex discussion much of which is not related to the statistical problems.

    I have not tried to understand your entire post. Let me note that when you have nested effects, the lowest level often makes the higher level irrelevant in fixed effect and similar estimators. That is, if people only are in one region, then people fixed effects will also include any regional effect giving you the same results with and without regions.

    Comment

    Working...
    X