Panel data with binary DV and categorical IV: Which model to use?

Dominik Schmitz

Join Date: Jun 2017

Posts: 7
#1

Panel data with binary DV and categorical IV: Which model to use?

02 Jun 2017, 01:22

Hi there,

I have a problem with the estimation of two panel data models:

The target of my analysis is to outline the effect of the sector in which a company operates on participation in a specific government programme; I have an unbalanced panel dataset consisting of 500 companies over 10 years.

The dependent variable is binary and states whether a company took place in a government programme. In the first model the DV sometimes is constant at firm level (meaning that the company participated in the programme in each year or never), in the second model the DV always is constant at company level (a company either always or never participated).

The main independent variable is given by the sector in which the company operates (and is thus time-invariant), further control variables as growth, ln(sales), profitability are added to the model.

When it comes to estimation I am not sure which model to use. At first I thought about xtprobit (random effects) or xtlogit (fixed effects) and ran a hausman test:

xtlogit programme i.sector growth ln_sales profitability past_performance , fe
estimates store fe
xtprobit programme i.sector growth ln_sales profitability past_performance , re
estimates store re
hausman fe re

Prob>chi2 = 0.0001, so I cannot use re and use fixed effects

When using FE, in the first model some of the business sectors are dropped as there is no within-group variance, the second model does not work at all with fixed effects as the outcome does not vary for any company.

To my nowledge the ordinary random effects model cannot be used as hausman suggests that the estimator is not consistent.
In another post on Statalist I read about using correlated random effects (http://conference.iza.org/conference...nonlin_iza.pdf, Stata commands 'mundlak' or 'xthybrid'), but this does not work using factor variables.

I really want to outline the effect of the sector on participation, so dropping the variable is not an option.

Does anyone know which model to use? Thanks a lot in advance!

Regards

Dominik
Tags: None
Dominik Schmitz

Join Date: Jun 2017

Posts: 7
#2

06 Jun 2017, 00:29

Does anybody have an idea? Any help would be strongyly appreciated!
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17714

06 Jun 2017, 00:41

Dominik:
have you tried prefixing the user-written programme -mundlak- with -xi-?

Code:

use "http://www.stata-press.com/data/r14/nlswork.dta", clear

. xi: mundlak ln_wage i.c_city
i.c_city          _Ic_city_0-1        (naturally coded; _Ic_city_0 omitted)

+------------------------------------------------+
|             Variable |     RE     |  Mundlak   |
|----------------------+------------+------------|
|           _Ic_city_1 |     -0.011 |     -0.033 |
|     mean___Ic_city_1 |            |      0.099 |
|                _cons |      1.660 |      1.632 |
|----------------------+------------+------------|
|                    N |      28526 |      28526 |
|                  N_g |   4711.000 |   4711.000 |
|                g_min |      1.000 |      1.000 |
|                g_avg |      6.055 |      6.055 |
|                g_max |     15.000 |     15.000 |
|                  rho |      0.592 |      0.592 |
|                 rmse |      0.320 |      0.319 |
|                 chi2 |      2.724 |     39.342 |
|                    p |      0.099 |      0.000 |
|                 df_m |      1.000 |      2.000 |
|                sigma |      0.501 |      0.501 |
|              sigma_u |      0.386 |      0.386 |
|              sigma_e |      0.320 |      0.320 |
|                 r2_w |      0.001 |      0.001 |
|                 r2_o |      0.002 |      0.004 |
|                 r2_b |      0.004 |      0.004 |
+------------------------------------------------+

.

Kind regards,
Carlo
(Stata 19.0)

Comment

Dominik Schmitz

Join Date: Jun 2017

Posts: 7
#4

06 Jun 2017, 01:34

Carlo,

thank you very much for your input!

Using xi: mundlak helps and there ist no error message regarding the factor variables any more.
Nevertheless, when running - xi: mundlak - the following message pops up for most sectors:

"The variable _Isector_2 does not vary sufficiently within groups and will not be used to create additional regressors.
0% of the total variance in _Isic_5 is within groups."

Afterwards an output table similiar to the one of your test dataset is given (but without additional regressors the sectors that are affected by the error message).
After that, the command - estimates replay Mundlak - gives coefficients for both, each sector and the additional regressors that have been created with mundlak.
Can I interpret the output or is that not possible? If possible, which sector coefficients do I have to interpret?

Furthermore, running the regression for the second case (constant binary dependent variable at company level) does not work, - xi: mundlak - gives an error message ("The dependent variable does not vary within groups."). Any ideas?

Thanks a lot in advance!

Kind regards

Dominik
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17714
#5

06 Jun 2017, 02:11

Dominik:
in both your cases, -mundlak- cannot create additional regressors.
Then, the whole matter boils down to -xtlogit, re- (which is the specification that -hausman- rejected).
That said, I think that you have to go back to -xtlogit, fe-, considering that this model estimates conditional fixed effects (something different from the fixed effects we are familiar with under -xtreg-).

Kind regards,
Carlo
(Stata 19.0)
Comment
Dominik Schmitz

Join Date: Jun 2017

Posts: 7
#6

06 Jun 2017, 02:44

Carlo,

thanks again for your reply.

Going back to - xtlogit, fe - leads to the problem that most sectors are dropped as they are considered as fixed effect ("note: 2.sector omitted because of no within-group variance."). Furthermore, the second model with constant DV at company level cannot be estimated at all ("outcome does not vary in any group").

Is there no way to keep i.sector as independent variable (and estimate coefficients that describe the impact on the DV) ?

Kind regards

Dominik
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17714
#7

06 Jun 2017, 03:13

Dominik:
if firms are nested within the same sector for the entire time horizon, there's nohing you can do, but change your model specification (if feasible). But are all the sectors included in your panel data regression time-invariant?
Instaed, I was wondering about how a panel data regression can be informative if the DV does not change across firms and/or years.

Kind regards,
Carlo
(Stata 19.0)
Comment
Dominik Schmitz

Join Date: Jun 2017

Posts: 7
#8

06 Jun 2017, 03:43

Carlo,

okay, thanks a lot! Most sectors in my regression are time-invariant, as most firms did not change the sector in which they operate. Out of ten different sectors,there is company-variation in only three of them. Following your suggestion I have to rethink whether I can change the specification in a useful way such that I can estimate the coefficients.

After thinking about this issue again I agree with your second point, I just did not want to drop the panel structure - but you are right, there is no benefit to tell Stata how the data are structured within time if my DV is not time-varying.
To circumvent the time-invariant DV I could run year-by-year probit regressions or do one regression with average coefficients for growth, log(sales), profitability and past performance.
Do you think this is an appropriate solution?

Kind regards

Dominik
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17714
#9

06 Jun 2017, 04:39

Dominik:
-I'm fine with your first point;
-as far as your second point, the best advice is to skim through the literature in your research field to see how others tackled that issue.

Kind regards,
Carlo
(Stata 19.0)
Comment
Dominik Schmitz

Join Date: Jun 2017

Posts: 7
#10

06 Jun 2017, 04:47

Carlo,

I will do so. Thank you very much for your quick and useful remarks, really helped me a lot!

Kind regards

Dominik
Comment

Announcement

Panel data with binary DV and categorical IV: Which model to use?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment