Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dynamic probit model with Wooldridge approach

    Hi,

    I would like to ask about the specification of dynamic probit model if I want to use Wooldrige (2005) approach? In the paper, it is written that command 'xtprobit' was used. However, in the STATA manual about xtprobit, I only found option of Random Effect (RE) and Population Average (PA) models. I do not find the clue of how can I specify the xtprobit command if I want to use Wooldridge (2005) approach.

    Ref: Simple Solutions to the Initial Conditions Problem in Dynamic, Nonlinear Panel Data Models
    with Unobserved Heterogeneity (JAE, Wooldridge, 2005).

    Thanks in advance for your help.

    Best,
    Rythia

  • #2
    Dynamic random effects probit models that account for initial conditions issues a la Wooldridge (Journal of Applied Econometrics, 2005) are indeed straightforward to implement.

    You can use xtprobit or, in Stata 13, meprobit. (The latter allows you to use vce(cluster) for instance.) For example:

    Code:
    meprobit outcome_t  lagged_outcome $predictors $avge_tvar outcome_t_1 if wave > wavefirst || pid: 
    xtprobit outcome_t  lagged_outcome $predictors $avge_tvar outcome_t_1 if wave > wavefirst, i(pid)
    I am assuming you have panel data (organised in long form) with waves indexed 1, ..., T for a large number of individuals i = 1, ..., N, each identified by personal identifier variable pid.
    outcome_t
    is the binary outcome variable at time t, and lagged_outcome is the variable referring to the value of the outcome at t-1.
    $predictors is a global macro containing the names of predictors; $avge_tvar is a global macro containing time-averaged variables (the 'Mundlak'-type variables defined by Wooldridge); outcome_t_1 is the value of the outcome in the first (initial) wave observed for each obs: wave refers to the wave number of the current observation; wavefirst identifies the first wave.

    For a relatively non-technical discussion of different approaches to modelling, including a comparison of different approaches to Initial Conditions (Wooldridge, Heckman, Orme), see:
    ‘The dynamics of social assistance receipt: measurement and modelling issues, with an application to Britain’, by Lorenzo Cappellari and Stephen P. Jenkins. OECD Social, Employment and Migration Working Paper 67, http://www.oecd.org/dataoecd/30/42/41414013.pdf. [Also available at http://www.iser.essex.ac.uk/pubs/wor...df/2008-34.pdf.]
    This includes extensive references to related literature. Among more recent papers, see
    "Handling initial conditions and endogenous covariates in dynamic/transition models for binary data with unobserved heterogeneity", Anders Skrondal and Sophia Rabe-Hesketh, Journal of the Royal Statistical Society, Series C (Applied Statistics), 2014, 63, Part 2, pp. 211–223

    Rythia: please conform with Forum etiquette, and re-register so that you log in here using your real name (firstname lastname). See the FAQ about this. It's easy to do: hit the Contact Us link at bottom right of screen and place your request. thank you.


    Comment


    • #3
      Hi Stephen,

      Thanks a lot for your reply! I have placed my request to re-register so that I can use my real name. Look forward for admin's reply.
      And I have quick clarifications about $avge_tvar, I understand that this should contain time-averaged independent variables for each observation, right? (1) So how if I have some binary independent variables (e.g. 1 if located in remote areas and 0 if located in non-remote areas). (2) Does it mean the time-averaged of lagged outcome should also be included in the $avge_tvar?

      Thank you so much for your reply. It is very helpful!

      Best,
      Rythia

      Comment


      • #4
        Thanks for re-registering.

        There is nothing in the discussions of the longitudinally-averaged variables, as far as I can recall, that refers to the types of variable (binary, categorical, continuous). Jeff Wooldridge may well have a view about this and I would trust him more than me. I do not have access to his graduate text at present to check if there is information in there. (Econometric Analysis of Cross-Section and Panel Data)

        Note that the use of longitudinally-averaged variables (in $avge_tvar) is not the formulation proposed by Wooldridge's (2005) article. Look at his section 5.1, and the references to using the (non-redundant) values of variables in each time period. Note also the discussion in the Skrondal and Rabe-Hesketh article that I referred to about use of time-averaged variables (see their "Section 4.3.1.2 Constrained Wooldridge solution") and the remarks, referring to literature, about use of longitudinally-averaged variables being OK only if the panel is sufficiently long. (Luckily, it has been in my own applications.)

        Comment


        • #5
          Dear Professor Stephen Jenkins,

          I have estimated a mixed effect ordered probit model using (as you suggested in case of meprobit)

          meoprobit outcome lagged_outcome initial_outcome $predictors $tavge_predictors || pid:

          Some of the predictors are continuous, some are categorical. While creating $tavge_predictors, I only considered time varying continuous variables.

          Estimation seems to be ok. But when I try to get the marginal effect by,

          margins, grand dydx(*)

          Stata returns, default prediction is a function of possibly stochastic quantities other than e(b), r(498);

          My question to you, was that estimation OK? If yes, how could I get the marginal effect for this estimation? Thanks for your help.

          Ujjwal Kumar Das, MSc Economics, Leeds University Business School

          Comment


          • #6
            I don't know the answer. I recommend that you go back to first principles. What precisely is the "marginal effect" that you are trying to estimate? Is it the statistic of interest? (Have you read, for example, Wooldridge's materials regarding Average Partial Effects?)

            PS please note the Forum preferences for full names to used here (firstname lastname). You can get this changed very easily by re-registering. See the FAQ about this. thank you

            Comment


            • #7
              Hi,

              sorry to go back on this a year later.

              Stephen, I was wondering why we have to specify 'if wave > wavefirst' in the xtprobit command. Including the lagged version of the dependent variable, the first observation is not automatically neglected in the likelihood for all units being missing?

              I have also an additional question about data points. I have read somewhere that for this kind of analysis we need a minimum of three observations per unit (e.g., one the 'outcome', one for the lag and one for the initial condition?). Is it right? From a technical point of view, I think 2 is enough. Perhaps having at least 3 obs is more interesting for interpreting the results?
              thanks and cheers Danilo

              Comment


              • #8
                Danilo: I can't recall the answer to Q1. But I think it's because "wavefirst" was not always equal to one in my data set-up. I don't know the answer to Q2

                Comment


                • #9
                  Thanks!

                  Comment


                  • #10


                    I am working on similar case and have my data set to long format.
                    I however run into the colinearity problem. I believe its because I used the initial values as they are observed and they are same as the value of the lagged variable in the second wave. My panel has 4 waves.
                    So I read further and found that I should estimate the initial values using the attached. I first estimate the model (1) then predict the residuals and then estimate the model (3) using the predicted residuals. Then after estimate model (5) using the predicted Yhat from model (3).
                    However, I don't know how to predict the residuals after XT estimations.

                    The further question I don't know if this is the logic behind Wooldridge (2005)

                    Thanks

                    Attached Files

                    Comment


                    • #11
                      @Stephen Jenkins
                      I was hoping to get the above query to you but I don't it would come to your message

                      Comment


                      • #12
                        Hi Stephen, Can you please give the code for generating:
                        outcome_t_1

                        Comment


                        • #13
                          Hi Kolpo,
                          you might be interested in - xtpdyn - available from the SSC archive for estimating dynamic random effects probit models as proposed by Rabe-Hesketh and Skrondal (2013).

                          Best,
                          Raffaele

                          Rabe-Hesketh, S., Skrondal, A. 2013. Avoiding biased versions of Wooldridge’s simple solution to the initial conditions problem. Economics Letters 120: 2, 346–349.


                          Comment


                          • #14
                            Originally posted by Stephen Jenkins View Post
                            Dynamic random effects probit models that account for initial conditions issues a la Wooldridge (Journal of Applied Econometrics, 2005) are indeed straightforward to implement.

                            Code:
                            meprobit outcome_t lagged_outcome $predictors $avge_tvar outcome_t_1 if wave > wavefirst || pid:
                            xtprobit outcome_t lagged_outcome $predictors $avge_tvar outcome_t_1 if wave > wavefirst, i(pid)
                            Dear Stephen,

                            May I ask three questions about the Wooldridge approach for dynamic probit model? Thank you!

                            Q1: In a summer school lecture note, Wooldridge suggested the stata command for dynamic probit with heterogeneity as, for example:

                            Code:
                            xtprobit y y_1 y1 xa xb xc xa2-xa18 xb2-xb18 xc2-xc18, re
                            where xa2-xa18 are the values of the regressor xa for period 2-18, so are xb2-xb18 and xc2-xc18.

                            I have tried this method, but all x2-x18 are omitted due to conlinearity and Stata stopped working when it is omitting x2-x18, without any feedback. Do you think that this is due to the too large time dimension of the panel (18 in my case)?

                            use of longitudinally-averaged variables being OK only if the panel is sufficiently long
                            You have mentioned that if the time dimension of the panel is long, then time-averaged variable is fine. So do you think that for long panel it is better to use time-averaged variables rather than the Wooldridge-type loop variable (e.g. x2-x18)? My concern is that for long panel, there will be too many Wooldridge-type loop variables as regressors in the equation and there may be colinearity like my case.

                            Q2: Wooldridge suggested that the Arellano and Bond GMM estimation for dynamic linear model is a good starting point for dynamic probit model, and he showed that the results generated by the two approaches are similar.

                            So do you think it makes sense to estimate dynamic binary outcome model by both Wooldridge's dynamic probit model and the Arellano and Bond GMM and then report their results in empirical research?

                            For my case, the result of Arellano and Bond GMM is very similar to that of static probit estimation, but very different from the Wooldridge dynamic probit for which very few variables are statistically significant.

                            Q3: In empirical research, is it necessary to report the coefficients of all the time-averaged variables?

                            Many thanks!
                            Last edited by Alex Mai; 24 Apr 2018, 06:44.

                            Comment


                            • #15
                              Q1: I have no idea why you are having a problem implementing Wooldridge's suggestion with your data. (And, sorry, I don't have time to investigate further.)

                              Q2. Mark Stewart, Journal of Applied Econometrics 22: 511–531 (2007) applies both types of model that you cite. His discussion is likely useful for guiding what you do.

                              Q3. No comment. It depends on context, and the "story" you're trying to tell, and the constraints of particular journals (e.g. whether appendices OK or not).


                              More generally, it appears that a reading of key papers in the applied literature would provide useful guidance to you. Mark Stewart's paper is an example; another is Martin Biewen's paper in the same journal in 2009. They look at more complex models than the basic Dynamic Random Effects Probit, but in effect cover related and relevant issues. Rabe-Hesketh and Skrondal have written 2 or 3 papers on the subject of whether to longitudinally-average or not (albeit from a statistician's viewpoint rather than economist's). My work with Dynamic Random Effects Probit models was with a long panel. Raffaele Grotti (#13) has implemented these models with short panels.

                              Comment

                              Working...
                              X