Multilevel logistic regression with endogenous regressor - IV approach

Dante Lango

Join Date: Nov 2018

Posts: 5
#1

Multilevel logistic regression with endogenous regressor - IV approach

21 Nov 2018, 05:42

Hello Statalisters,

I am puzzled with the use of multi-level command melogit in particular when the regression may have endogenous regressors, with the endogeneity caused by reverse causality, self-selection or scale reference biases caused by survey data (and self-assessed measures).

By using Stata 13.0, I am analyzing the following survey https://dbk.gesis.org/DBKsearch/SDES...770&tab=3&db=E. Where there are individuals nested into countries (I am analyzing 32 countries) and a series of regressors based on employment and demographic characteristics, together with job demands and work-life balance factors. The research question is to estimate the effect of perceived security on the probability of reporting high stress.

I am estimating the following random intercept logit model for job related stress (self-assessed):

random intercept model:

Code:

melogit HighStress $demo $employmentcharacteristics $workdemandfactors $worklifefactors security || country: ,or

The model converges and the random intercept is significant also LR-test suggests to use a multilevel approach.

However, the variable security that is defined as the perceived security of the worker may be endogenous given that individuals with lower mental health may self-select into less secure jobs (as explained in https://doi.org/10.1002/hec.3122) most of the literature use a single level model and an IV approach to proxy the variable. My dataset is particularly rich in variables that describe employability of the workers that once are integrated with the EPL*dismissal rate (risk of dismissal) may represent a good structural equation to break the loop of reverse causality.
However, since the variability between countries is quite high for the model of security I should probably take it into account by considering a second level with random coefficient (given by either EPL*dismissal rate of industry or just EPL). O
n the web
I could not find any strategy to do it; any ideas or sources that could help me?

Thanks in advance for any help you could provide.
NB: EPL is the acronym of Employment Protection Legislation

Last edited by Dante Lango; 21 Nov 2018, 05:47.
Tags: endogenous regressors, instrumental variables, Multilevel Analysis
Dante Lango

Join Date: Nov 2018

Posts: 5
#2

22 Nov 2018, 07:45

I made it longer than it needed to. The question is:

can I do a two stage regression without using the stata command for IVreg and by using two
multilevel logit regressions instead? Such as:

Code:

melogit HighStress $demo $employmentcharacteristics $workdemandfactors $worklifefactors psecure || country: ,or

where psecure is given by the predict command after the following regression on the binary variable security:

Code:

melogit security $demo $employmentcharacteristics $some_valid_instruments_individual || country: EPL*dismissal ,or
Comment
Phil Bromiley

Join Date: Apr 2014

Posts: 4348
#3

26 Nov 2018, 12:57

You'll increase your chances of a helpful answer by following the FAQ on asking questions - provide Stata code in code delimiters, readable Stata output, and sample data using dataex.

You might find it easier to use xtlogit. However, I suspect that doing the predicted value manually will give you incorrect standard errors (as it does in standard instrumental variables). I'm not sure of a routine that handles this - you can probably do it in GSEM. User written cmp might be an option.
Comment
Dante Lango

Join Date: Nov 2018

Posts: 5
#4

17 Dec 2018, 07:56

Originally posted by Phil Bromiley View Post

You'll increase your chances of a helpful answer by following the FAQ on asking questions - provide Stata code in code delimiters, readable Stata output, and sample data using dataex.

You might find it easier to use xtlogit. However, I suspect that doing the predicted value manually will give you incorrect standard errors (as it does in standard instrumental variables). I'm not sure of a routine that handles this - you can probably do it in GSEM. User written cmp might be an option.

Thank you for your answer,

I have been advised to use GSEM so that it takes into account the endogeneity of the variables. I will proceed in that way in future researches.
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#5

17 Dec 2018, 10:00

A couple of clarifying remarks:

1. The way how commands dealing with endogeneity in nonlinear models in Stata are called is a bit misfortunate. -ivprobit-, -ivtobit- are misnomers, because what these routines implement is not an IV approach. What they implement is known in econometrics/statistics as a Control Function approach.

2. I am not familiar with exactly what -melogit- does, but if you are looking for a solution to endogeneity (whether you implements it through -gsem-, or another routine, or manually), you should be looking for a solution based on Probit (not Logit). There is a reason why there is no ivlogit in Stata, and the reason is that these Control Function approaches (misnamed in Stata IV-something like in -ivprobit- and -ivtobit-) build up from the joint distribution of the errors in the multiple equation errors. The normal distributions has a lot of peculiar features (e.g., joint normality of the errors implies normality in the marginal distributions of the errors), and these features are heavily used in the derivations, and these features are not featured in the logistic distribution.

To cite Brian Poi:

The maximum likelihood estimators used by -ivprobit- and -ivtobit- are derived by assuming the error term in the structural equation and the error term in the reduced-form equation for the endogenous regressor are jointly normally distributed. The derivation of the two-step estimators also assumes bivariate normal errors.

If you specify the structural equation as logit instead of probit, then you would need to figure out the appropriate bivariate distribution to use for the error terms, and my guess is that it would be much more complicated to work with than the bivariate normal distribution.

-- Brian Poi
-- [email protected]

https://www.stata.com/statalist/arch.../msg00768.html
1 like
Comment

Announcement

Multilevel logistic regression with endogenous regressor - IV approach

Comment

Comment

Comment

Comment