Multilevel logit with truncated dependent variable

Simone Aresu

Join Date: Jan 2022

Posts: 4
#1

Multilevel logit with truncated dependent variable

05 Jan 2022, 08:07

Dear Statalists, my sample is an unbalanced worldwide panel data sample of around 2000 firms in the period 2001-2015. The dependent variable is the first-time adoption of a firm-level corporate governance policy. More specifically, the dependent variable equals 1 for the firm-year observation in which a firm adopts the policy for the first time and 0 for the firm-year observations before the first adoption. If a firm never adopts the policy during the period of analysis, the variable is coded as 0 for all available years. Also, I remove a firm from the sample after it adopts the policy. Thus, the dependent variable is truncated.
As the main independent variable is at the country-level (i.e., a proxy of the country’s regulatory pressure), I was thinking of using a multilevel logit, with firm (level 1) and country (level 2) as clusters.
First, I have run the unconditional means model, to examine to what extent each level of the analysis explains the variance in the dependent variable (between-level analysis) (Raudenbush & Bryk, 2002; Aguinis et al., 2013). However, because of the specific features of the dependent variable (a truncated variable that takes the values of 0 until the event occurs), the variance partition coefficient of the cluster ‘firm’ tends to 0. In other words, the unconditional means model suggests not to include the cluster ‘firm’. Thus, the idea was not to use such a model, and prefer a logit model, adjusting the standard errors to consider both the cluster ‘country’ and the cluster ‘firm’ (using the Stata command vcemway, see Gu and Yoo, 2019). Would you agree with this approach?
Thank you in advance for your help

Code:

melogit DEPVAR || country_cluster: || firm_cluster:
Tags: None
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2148
#2

05 Jan 2022, 10:51

This seems more like grouped duration data. It is censored (not truncated) in that, for some firms, they never adopt and so you don't know whether they eventually will. Dropping firms once they adopt is not truncation because the assumption is that adoption is an absorbing state. Stephen Jenkins has written about this and there's a user-written Stata command, pgmhaz, that performs the estimation.
1 like
Comment
Mike Lacy

Join Date: Apr 2014

Posts: 2411
#3

05 Jan 2022, 11:41

I'd add to the useful advice from Jeff Wooldridge that Stephen Jenkins has a helpful website on this and related topics:

https://www.iser.essex.ac.uk/resourc...sis-with-stata,

within which the document at

https://www.iser.essex.ac.uk/files/t...s/ec968st6.pdf seems most relevant to the current question.

Within Stata, -search discrete time hazard- also will reveal other good stuff.
Comment
Simone Aresu

Join Date: Jan 2022

Posts: 4
#4

10 Jan 2022, 02:25

Thank you very much Jeff Wooldridge and Mike Lacy for the suggested command and material. I have tried to incorporate these tests into the analysis.

If I use a multilevel analysis taking into account different clusters (e.g. firm-year observations are nested within firms, firms are nested within countries), I could also use a multilevel model (e.g., Stata melogit or mecloglog commands).

However, because of the specific features of the dependent variable (a censored variable that takes the values of 0 until the event occurs), the variance partition coefficient of the cluster ‘firm’ tends to 0 in the unconditional means model. Thus, the unconditional means model suggests not to include the cluster ‘firm’.

My question is: could I justify the choice not to use a multilevel model because of this result from the unconditional means model?

The alternative would be to use other models, like the ones you suggest. Thank you again for your help.
Comment
Stephen Jenkins

Join Date: Apr 2014

Posts: 1432
#5

10 Jan 2022, 08:37

#4: Jeff Wooldridge and Mike Lacy give great advice. Note also that you do not need a "multilevel" (a.k.a. mixed or hierarchical) model. Be aware that putting these sort of data into firm-year ("long") format is primarily a convenient way to fitting the models using existing software tools. Put differently, once the data are organised thus, the 'correct' log-likelihood function is maximized. My -pgmhaz8- (on SSC) fits models for interval-censored (grouped) data, (i) without unobserved heterogeneity (frailty); and (ii) with Gamma distributed unobserved heterogeneity. Use -xtcloglog- (built-in) if you want normally distributed frailty instead; or my -hshaz- (SSC) for Heckman-Singer discrete mass point frailty (latent classes). My resources, cited by Mike, provide worked examples.
Comment
Simone Aresu

Join Date: Jan 2022

Posts: 4
#6

20 Jan 2022, 08:59

Thank you for the useful suggestion Stephen Jenkins. I have read your Survival Analysis with Stata materials. One last question: for discrete time would you recommend a model that controls for Gamma distributed unobserved heterogeneity (command: pgmhaz8) or normally distributed frailty (command: xtcloglog). In my panel data set, statistically significant frailty occurs (both Gamma and normal). Thank you
Comment
Stephen Jenkins

Join Date: Apr 2014

Posts: 1432
#7

20 Jan 2022, 09:46

I don't have a "right" answer for you. If I were you, I'd also look at the results from using -hshaz-. And by "results" I mean, look at the substantive implications of the parameter estimates for individuals with different sets of characteristics. Does the choice of heterogeneity distribution make a substantive
difference to e.g. predicted median duration?

You might also have a look at this article: "The unobserved heterogeneity distribution in duration analysis", by Jaap H. Abbring, Gerard J. Van Den Berg, Biometrika, Volume 94, Issue 1, March 2007, Pages 87–99, https://doi.org/10.1093/biomet/asm013

Abstract:

In a large class of hazard models with proportional unobserved heterogeneity, the distribution of the heterogeneity among survivors converges to a gamma distribution. This convergence is often rapid. We derive this result as a general result for exponential mixtures and explore its implications for the specification and empirical analysis of univariate and multivariate duration models.
1 like
Comment
Simone Aresu

Join Date: Jan 2022

Posts: 4
#8

11 Feb 2022, 03:12

Thank you very much Prof. Stephen Jenkins
Comment

Announcement

Multilevel logit with truncated dependent variable

Comment

Comment

Comment

Comment

Comment

Comment

Comment