Dynamic probit model with Wooldridge approach

Rythia Afkar

Join Date: Jun 2014

Posts: 14
#1

Dynamic probit model with Wooldridge approach

02 Jan 2015, 03:10

Hi,

I would like to ask about the specification of dynamic probit model if I want to use Wooldrige (2005) approach? In the paper, it is written that command 'xtprobit' was used. However, in the STATA manual about xtprobit, I only found option of Random Effect (RE) and Population Average (PA) models. I do not find the clue of how can I specify the xtprobit command if I want to use Wooldridge (2005) approach.

Ref: Simple Solutions to the Initial Conditions Problem in Dynamic, Nonlinear Panel Data Models
with Unobserved Heterogeneity (JAE, Wooldridge, 2005).

Thanks in advance for your help.

Best,
Rythia
Tags: None
Stephen Jenkins

Join Date: Apr 2014

Posts: 1433
#2

02 Jan 2015, 03:57

Dynamic random effects probit models that account for initial conditions issues a la Wooldridge (Journal of Applied Econometrics, 2005) are indeed straightforward to implement.

You can use xtprobit or, in Stata 13, meprobit. (The latter allows you to use vce(cluster) for instance.) For example:

Code:

meprobit outcome_t lagged_outcome $predictors $avge_tvar outcome_t_1 if wave > wavefirst || pid: xtprobit outcome_t lagged_outcome $predictors $avge_tvar outcome_t_1 if wave > wavefirst, i(pid)

I am assuming you have panel data (organised in long form) with waves indexed 1, ..., T for a large number of individuals i = 1, ..., N, each identified by personal identifier variable pid.
outcome_t is the binary outcome variable at time t, and lagged_outcome is the variable referring to the value of the outcome at t-1.
$predictors is a global macro containing the names of predictors; $avge_tvar is a global macro containing time-averaged variables (the 'Mundlak'-type variables defined by Wooldridge); outcome_t_1 is the value of the outcome in the first (initial) wave observed for each obs: wave refers to the wave number of the current observation; wavefirst identifies the first wave.

For a relatively non-technical discussion of different approaches to modelling, including a comparison of different approaches to Initial Conditions (Wooldridge, Heckman, Orme), see:
‘The dynamics of social assistance receipt: measurement and modelling issues, with an application to Britain’, by Lorenzo Cappellari and Stephen P. Jenkins. OECD Social, Employment and Migration Working Paper 67, http://www.oecd.org/dataoecd/30/42/41414013.pdf. [Also available at http://www.iser.essex.ac.uk/pubs/wor...df/2008-34.pdf.]
This includes extensive references to related literature. Among more recent papers, see
"Handling initial conditions and endogenous covariates in dynamic/transition models for binary data with unobserved heterogeneity", Anders Skrondal and Sophia Rabe-Hesketh, Journal of the Royal Statistical Society, Series C (Applied Statistics), 2014, 63, Part 2, pp. 211–223

Rythia: please conform with Forum etiquette, and re-register so that you log in here using your real name (firstname lastname). See the FAQ about this. It's easy to do: hit the Contact Us link at bottom right of screen and place your request. thank you.
1 like
Comment
Rythia Afkar

Join Date: Jun 2014

Posts: 14
#3

02 Jan 2015, 05:27

Hi Stephen,

Thanks a lot for your reply! I have placed my request to re-register so that I can use my real name. Look forward for admin's reply.
And I have quick clarifications about $avge_tvar, I understand that this should contain time-averaged independent variables for each observation, right? (1) So how if I have some binary independent variables (e.g. 1 if located in remote areas and 0 if located in non-remote areas). (2) Does it mean the time-averaged of lagged outcome should also be included in the $avge_tvar?

Thank you so much for your reply. It is very helpful!

Best,
Rythia
Comment
Stephen Jenkins

Join Date: Apr 2014

Posts: 1433
#4

03 Jan 2015, 07:46

Thanks for re-registering.

There is nothing in the discussions of the longitudinally-averaged variables, as far as I can recall, that refers to the types of variable (binary, categorical, continuous). Jeff Wooldridge may well have a view about this and I would trust him more than me. I do not have access to his graduate text at present to check if there is information in there. (Econometric Analysis of Cross-Section and Panel Data)

Note that the use of longitudinally-averaged variables (in $avge_tvar) is not the formulation proposed by Wooldridge's (2005) article. Look at his section 5.1, and the references to using the (non-redundant) values of variables in each time period. Note also the discussion in the Skrondal and Rabe-Hesketh article that I referred to about use of time-averaged variables (see their "Section 4.3.1.2 Constrained Wooldridge solution") and the remarks, referring to literature, about use of longitudinally-averaged variables being OK only if the panel is sufficiently long. (Luckily, it has been in my own applications.)
Comment
Ujjwal

Join Date: Jul 2014

Posts: 56
#5

23 Aug 2015, 00:26

Dear Professor Stephen Jenkins,

I have estimated a mixed effect ordered probit model using (as you suggested in case of meprobit)

meoprobit outcome lagged_outcome initial_outcome $predictors $tavge_predictors || pid:

Some of the predictors are continuous, some are categorical. While creating $tavge_predictors, I only considered time varying continuous variables.

Estimation seems to be ok. But when I try to get the marginal effect by,

margins, grand dydx(*)

Stata returns, default prediction is a function of possibly stochastic quantities other than e(b), r(498);

My question to you, was that estimation OK? If yes, how could I get the marginal effect for this estimation? Thanks for your help.

Ujjwal Kumar Das, MSc Economics, Leeds University Business School
Comment
Stephen Jenkins

Join Date: Apr 2014

Posts: 1433
#6

23 Aug 2015, 02:43

I don't know the answer. I recommend that you go back to first principles. What precisely is the "marginal effect" that you are trying to estimate? Is it the statistic of interest? (Have you read, for example, Wooldridge's materials regarding Average Partial Effects?)

PS please note the Forum preferences for full names to used here (firstname lastname). You can get this changed very easily by re-registering. See the FAQ about this. thank you
Comment
Danilo Bolano

Join Date: Aug 2016

Posts: 2
#7

28 Aug 2016, 21:51

Hi,

sorry to go back on this a year later.

Stephen, I was wondering why we have to specify 'if wave > wavefirst' in the xtprobit command. Including the lagged version of the dependent variable, the first observation is not automatically neglected in the likelihood for all units being missing?

I have also an additional question about data points. I have read somewhere that for this kind of analysis we need a minimum of three observations per unit (e.g., one the 'outcome', one for the lag and one for the initial condition?). Is it right? From a technical point of view, I think 2 is enough. Perhaps having at least 3 obs is more interesting for interpreting the results?
thanks and cheers Danilo
Comment
Stephen Jenkins

Join Date: Apr 2014

Posts: 1433
#8

29 Aug 2016, 01:23

Danilo: I can't recall the answer to Q1. But I think it's because "wavefirst" was not always equal to one in my data set-up. I don't know the answer to Q2
Comment
Danilo Bolano

Join Date: Aug 2016

Posts: 2
#9

29 Aug 2016, 17:18

Thanks!
Comment
Shadrack Mutembereza

Join Date: Jun 2017

Posts: 25
#10

16 Jun 2017, 08:52

I am working on similar case and have my data set to long format.
I however run into the colinearity problem. I believe its because I used the initial values as they are observed and they are same as the value of the lagged variable in the second wave. My panel has 4 waves.
So I read further and found that I should estimate the initial values using the attached. I first estimate the model (1) then predict the residuals and then estimate the model (3) using the predicted residuals. Then after estimate model (5) using the predicted Yhat from model (3).
However, I don't know how to predict the residuals after XT estimations.

The further question I don't know if this is the logic behind Wooldridge (2005)

Thanks

Attached Files

Wooldridge 2017.docx (12.7 KB, 1 view)
Comment
Shadrack Mutembereza

Join Date: Jun 2017

Posts: 25
#11

16 Jun 2017, 10:35

@Stephen Jenkins
I was hoping to get the above query to you but I don't it would come to your message
Comment
kolpo kotha

Join Date: Mar 2018

Posts: 22
#12

06 Mar 2018, 16:15

Hi Stephen, Can you please give the code for generating:
outcome_t_1
Comment
Raffaele Grotti

Join Date: Feb 2015

Posts: 54
#13

06 Mar 2018, 16:55

Hi Kolpo,
you might be interested in - xtpdyn - available from the SSC archive for estimating dynamic random effects probit models as proposed by Rabe-Hesketh and Skrondal (2013).

Best,
Raffaele

Rabe-Hesketh, S., Skrondal, A. 2013. Avoiding biased versions of Wooldridge’s simple solution to the initial conditions problem. Economics Letters 120: 2, 346–349.
Comment
Alex Mai

Join Date: May 2016

Posts: 213
#14

24 Apr 2018, 05:48

Originally posted by Stephen Jenkins View Post

Dynamic random effects probit models that account for initial conditions issues a la Wooldridge (Journal of Applied Econometrics, 2005) are indeed straightforward to implement.

Code:

meprobit outcome_t lagged_outcome $predictors $avge_tvar outcome_t_1 if wave > wavefirst || pid: xtprobit outcome_t lagged_outcome $predictors $avge_tvar outcome_t_1 if wave > wavefirst, i(pid)

Dear Stephen,

May I ask three questions about the Wooldridge approach for dynamic probit model? Thank you!

Q1: In a summer school lecture note, Wooldridge suggested the stata command for dynamic probit with heterogeneity as, for example:

Code:

xtprobit y y_1 y1 xa xb xc xa2-xa18 xb2-xb18 xc2-xc18, re

where xa2-xa18 are the values of the regressor xa for period 2-18, so are xb2-xb18 and xc2-xc18.

I have tried this method, but all x2-x18 are omitted due to conlinearity and Stata stopped working when it is omitting x2-x18, without any feedback. Do you think that this is due to the too large time dimension of the panel (18 in my case)?

use of longitudinally-averaged variables being OK only if the panel is sufficiently long

You have mentioned that if the time dimension of the panel is long, then time-averaged variable is fine. So do you think that for long panel it is better to use time-averaged variables rather than the Wooldridge-type loop variable (e.g. x2-x18)? My concern is that for long panel, there will be too many Wooldridge-type loop variables as regressors in the equation and there may be colinearity like my case.

Q2: Wooldridge suggested that the Arellano and Bond GMM estimation for dynamic linear model is a good starting point for dynamic probit model, and he showed that the results generated by the two approaches are similar.

So do you think it makes sense to estimate dynamic binary outcome model by both Wooldridge's dynamic probit model and the Arellano and Bond GMM and then report their results in empirical research?

For my case, the result of Arellano and Bond GMM is very similar to that of static probit estimation, but very different from the Wooldridge dynamic probit for which very few variables are statistically significant.

Q3: In empirical research, is it necessary to report the coefficients of all the time-averaged variables?

Many thanks!

Last edited by Alex Mai; 24 Apr 2018, 06:44.
Comment
Stephen Jenkins

Join Date: Apr 2014

Posts: 1433
#15

24 Apr 2018, 10:24

Q1: I have no idea why you are having a problem implementing Wooldridge's suggestion with your data. (And, sorry, I don't have time to investigate further.)

Q2. Mark Stewart, Journal of Applied Econometrics 22: 511–531 (2007) applies both types of model that you cite. His discussion is likely useful for guiding what you do.

Q3. No comment. It depends on context, and the "story" you're trying to tell, and the constraints of particular journals (e.g. whether appendices OK or not).

More generally, it appears that a reading of key papers in the applied literature would provide useful guidance to you. Mark Stewart's paper is an example; another is Martin Biewen's paper in the same journal in 2009. They look at more complex models than the basic Dynamic Random Effects Probit, but in effect cover related and relevant issues. Rabe-Hesketh and Skrondal have written 2 or 3 papers on the subject of whether to longitudinally-average or not (albeit from a statistician's viewpoint rather than economist's). My work with Dynamic Random Effects Probit models was with a long panel. Raffaele Grotti (#13) has implemented these models with short panels.
Comment

Announcement

Dynamic probit model with Wooldridge approach

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment