Two Way Mundlak in Pooled Model

Keegan Robertson

Join Date: Dec 2023
Posts: 12

#16

08 May 2024, 22:44

I've been working on the data over the past few months but I am back to this same set of issues with the regressions.

To clarify, I am attempting to complete a fractional regression that contains fixed-effects on an unbalanced panel of districts (N: 5480) within states (N: 48) over years (N: 126).

I tried looking at glm for this case but it appears that it does logit for the binary outcome variable, but not fractional as proportions - is that right? I did find another fracglm but I can't see that it does anything different from fracreg in my case.

I've tried subsampling to see if I can get to equivalent results various ways and they don't seem to line up. I'm not quite sure what I'm missing. Here are results from a 3-year subsample so I can run the effects as dummies and compare:

1. Using CRE to absorb district and state-by-year variables which I understood from previous notes should produce iteratively demeaned results equivalent to fixed-effects (reghdfe does work for the linear case and my understanding was cre just runs reghdfe in the background?):

Code:

egen state_year = group (state yr)
cre, abs (dist state_year) prefix(fixed) keep keepsingletons: regress y x, cluster(dist)
fracreg logit y x fixed*, vce(cl dist)
margins, dydx(*) post

Which gets me:

Code:

Average marginal effects                                 Number of obs = 1,213
Model VCE: Robust

Expression: Conditional mean of y, predict()
dy/dx wrt:  x fixed1_x fixed2_x

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
           x |  -.0092139   .0296722    -0.31   0.756    -.0673703    .0489426
    fixed1_x |   .0021522    .029974     0.07   0.943    -.0565957    .0609001
    fixed2_x |  -.0069721    .030177    -0.23   0.817     -.066118    .0521738
------------------------------------------------------------------------------

2. Just adding these as categorical dummies:

Code:

fracreg logit y x i.dist i.state#i.yr, vce(cl dist)
margins, dydx(*) post

gets me:

Code:

Average marginal effects                                 Number of obs = 1,213
Model VCE: Robust

Expression: Conditional mean of y, predict()
dy/dx wrt:   (I have omitted this for brevity)
---------------------------------------------------------------------------------
                |            Delta-method
                |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
----------------+----------------------------------------------------------------
              x |  -.0037571   .0295284    -0.13   0.899    -.0616317    .0541175

What am I missing?

Comment

FernandoRios

Join Date: Apr 2014

Posts: 2459
#17

09 May 2024, 14:13

two things
1) you are assuming you should get the same result using CRE fraclogit vs fraclogit with dummies.
They will only be the same when the model is linear. For other cases, they will be similar.
For example, your marginal effect in both cases is almost zero with a very large STD error. So both cases are basically identically.
2) You seem to be using both State (52?) combined with Time (10?) How many units will you have within each State Time group? Enough to identify "dist?"
CRE will create the means (actually works with HDFE not Reghdfe (but some author)

HTH
Comment
Keegan Robertson

Join Date: Dec 2023

Posts: 12
#18

12 May 2024, 19:01

Hi Fernando,
Thanks for your reply.

1) I thought that was the point that Jeff has made - that the Mundlak approach of controlling for the mean can be used to replicate/replace the dummy approach of fixed-effects, and that it does work in non-linear cases. In terms of my example above the difference in coefficient/significance between the CRE-supported de-meaning and the dummy approach is somewhat irrelevant as you say, but the problem is this is a small subsample so I can't be sure that the relative outcome is the same for my full dataset since the dummy approach seems to be impossible to complete (I tried running it over a number of days).

2) The number of units per state-year group varies as there are inconsistent numbers of districts per state - it ranges from 1 to ~30.
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2459
#19

13 May 2024, 03:51

1) you are correct. Dummy inclusion may be infeasible in large nonlinear models. And with many fixed effects it may be inconsistent.
Other than argue they will be similar and cite relevant work, there isn’t much you can do. simulations?
2) different districts per state works. But using a dummy as you are is wrong. You are saying the fe that affects state 1 district 1 is the same as state 34 district 1
Comment
Keegan Robertson

Join Date: Dec 2023

Posts: 12
#20

13 May 2024, 05:43

Thanks Fernando, I've re-read what you said re 1) and that makes sense now - thanks!

For 2) using a state-year fixed effect was a way to capture state-level effects during particular years that district effects and year effects may not otherwise control for as the districts form a group across a wider country sample. This seems to be a common method in the lit I'm referencing. Is the issue you perceive specifically due to the presence of singletons? or unbalanced panels in general? I try separately keeping and dropping the singletons in a preliminary linear assessment of the data following reghdfe, and it has a negligible effect in all but 2 of my models (and even those two are small differences in standard errors with the same coefficient). I hadn't attempted to drop singletons from the fractional approach due to what I believed to be a lack of necessity after the linear findings.
Comment

Announcement

Comment

Comment

Comment

Comment

Comment