Heckman selection model in three stages

Rabab Al hasni

Join Date: May 2019
Posts: 70

Heckman selection model in three stages

27 Oct 2019, 11:18

Dear Statalist,

I hope you are well. Please, I would like to ask regarding the Heckman selection model, how to apply it with my categorical data. In fact, I have four categories in one column referred to as dependent variables. At first, I have analyzed the estimation of these variables using Multinomial logit regression. However, in order to test the selection bias, have been advised to apply Heckman probit selection model using Stata.

My question is how to apply the hackman model over the four dependent variables? Do I need first to create a sperate table of dataset for each dependent variable? Or can I keep them in one column and the software is intelligent to select the required dependent variable for each stage? Regarding the Stata command, should I use the syntax command for each stage with only changing the dependent variable?

The command: heckman depvar [indepvars], select(depvars = varlists) [twostep]

I have 300 firms as sample size; dealing with 32 independent variables. Each firm has to choose one decision of the four categories (i.e. dependent variables). The below Table shows my idea to employ the analysis of hackman over the dependent variables in three stages.

Could you please advise me on how to apply the Heckman model for the 3 stages?

Stage 1 analyzing all firms (300 n)	Stage 2 Analyzing Firm B	Stage 3 Analysing Firm C
dependent variable Firm decide A (87n)	dependent variable Firm decide C (103 n)	dependent variable Firm decide E (83n)
dependent variable Firm decide B(213 n)	dependent variable Firm decide D (110 n)	dependent variable Firm decide F (20n)

Note: the letters from A to F is the type of firms’ decision and is considered to be the dependent variable. (n) refer to the number of firms in each category of the dependent variable

Your kind help and support greatly appreciated

Kind regards,

Rabab

Last edited by Rabab Al hasni; 27 Oct 2019, 11:24.

Tags: None

Phil Bromiley

Join Date: Apr 2014

Posts: 4348
#2

28 Oct 2019, 12:03

You didn't get a quick answer. You'll increase your chances of a useful answer by following the FAQ on asking questions - provide Stata code in code delimiters, readable Stata output, and sample data using dataex.

Part of the problem is that I can't understand what your data looks like. First, if you have 300 firms making multiple decisions, you certainly want firms to be observations. Whether you want the decisions as separate observations under firms is not clear.

If all firms must choose A, B, C, or D, and you have data on all firms, then I don't see how this is a selection model.
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2469
#3

28 Oct 2019, 19:08

Hi Rabab
I actually have a different take on your model. If I understand it correctly, you are facing a three-step heckprobit model.
As far as i know, there is no ready made command to estimate something like this, and manual two step procedures are not appropriate. You could, however, use the user written command -cmp- to estimate this model.
The caveat, however, is that it is a very hard model to estimate. I helped a coworker do something like this with some simulated data, and even then the model was difficult to estimate, with lots of problems of convergence.
Perhaps you can look at the basic heckprobit syntax using CMP and adapt it for the three stages you describe.
HTH
Fernando
Comment
Rabab Al hasni

Join Date: May 2019

Posts: 70
#4

29 Oct 2019, 09:56

Originally posted by Phil Bromiley View Post

You didn't get a quick answer. You'll increase your chances of a useful answer by following the FAQ on asking questions - provide Stata code in code delimiters, readable Stata output, and sample data using dataex.

Part of the problem is that I can't understand what your data looks like. First, if you have 300 firms making multiple decisions, you certainly want firms to be observations. Whether you want the decisions as separate observations under firms is not clear.

If all firms must choose A, B, C, or D, and you have data on all firms, then I don't see how this is a selection model.

Dear Phil Bromiley and FernandoRios,

Many thanks for giving my enquiry your concern and prompt reply. Indeed I am so grateful to you for your advice and help in my research analysis.

From my point view, I do not think that there is a selection bias would exist on my data analysis because each scenario of firms accessibility for bank credit is constructed on my survey those who have applied and who have not (I.e. applicants and non-applicants). Each of which consist of two categories Thus, I have 4 categories of unorder dependent variables. In addition, the survey was distributed randomly among the population.

However, to avoid the argument about the possibility of existing selection bias within firms decision, I have tried to apply the analysis using syntax command:

hackman appl_bnk varlist of firms and ownercharacterisitics, (dy=
varlist of firms and ownercharacterisitics employment growth, asset, and cashflow growth) twostep

The Mills ratio was not significant for p-value and the coefficient is positive; as the result shows below:

----------------------------------+----------------------------------------------------------------
mills
lambda | .1477126 .1326721 1.11 0.266 -.11232 .4077451
----------------------------------+----------------------------------------------------------------
rho | 0.40949
sigma | .3607259
---------------------------------------------------------------------------------------------------

The result indicates that there is no significant issue of selection bias, am I right on this interpretation?

Now in case that the Mills ratio shows no bias, can I use the Multinomial regression to estimate the determinants for the 4 decisions of accessing bank credit? Because some analysts concerned about the issue of Independence of Irrelevant Alternatives (IIA) in the Multinomial analysis outcomes.

Thank you again for your time.

Best regards,
Rabab
Comment
Rabab Al hasni

Join Date: May 2019

Posts: 70
#5

29 Oct 2019, 10:00

Originally posted by FernandoRios View Post

Hi Rabab
I actually have a different take on your model. If I understand it correctly, you are facing a three-step heckprobit model.
As far as i know, there is no ready made command to estimate something like this, and manual two step procedures are not appropriate. You could, however, use the user written command -cmp- to estimate this model.
The caveat, however, is that it is a very hard model to estimate. I helped a coworker do something like this with some simulated data, and even then the model was difficult to estimate, with lots of problems of convergence.
Perhaps you can look at the basic heckprobit syntax using CMP and adapt it for the three stages you describe.
HTH
Fernando

Dear
FernandoRios,

Thank you for your effort and suggestion. I will try to use the CMP analysis.

Best regards,
Rabab
Comment

Announcement

Heckman selection model in three stages

Comment

Comment

Comment

Comment