Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • IV-oprobit using cmp command

    Dear all,

    I am running an IV ordered probit regression using the cmp command. The dependent variable is education level (discrete and ordered), which is regressed on patience and a set of controls. My idea is to instrument patience with parents’ education.

    The cmp command (approach 1) is the following:

    cmp (edulevel=patience `controls') (patience=parentsedu `controls') , ind($cmp_oprobit $cmp_cont) nolr

    I tried to replicate the results manually by using a two-step procedure (approach 2), where I regress patience on its instruments, and use the predicted values of patience in the ordered probit estimation:

    regress patience parentsedu `controls'
    predict patiencehat
    oprobit edulevel patiencehat `controls'

    The results differ substantially. I noticed that using cmp the sample sizes change between the different steps. But even when keeping the sample size constant, I do not obtain the same results as with the first approach.

    How exactly does cmp fit the model, respectively why do the results of the approaches differ? Does the manual implementation of approach 2 make sense, or was it a bad idea in the first place?


    Thank you in advance.

  • #2
    Hi Felix,
    I describe the econometrics of cmp in this paper. The model is that there is an underlying bivariate normal distribution for the error terms in the two equations. cmp uses Maximum Likelihood to directly model this error process.

    Your two-stage approach is intuitive, but inconsistent. Unfortunately, I cannot reconstruct the reasoning right now (I should be able to!). Here is a demonstration:

    Code:
    set obs 10000
    mat C = 1, .5, 1
    drawnorm e1 e2,corr(C) cstor(lower) // 1st- and 2nd-stage errors, correlated 0.5
    drawnorm z  // instrument
    gen x = z+e1 // instrumented variable
    gen ystar=x+e2 // unobserved 2nd-stage dependent variable
    egen y = cut(ystar), at(-10 -1 0 1 2 10) // censored, observed version of dep var
    oprobit y x // inconsistent (true coef = 1)
    cmp (y=x), ind(5) qui // same
    cmp (y=x) (x=z), ind(5 1) qui // consistent
    regress x z
    predict xhat
    oprobit y xhat // two-stage approach inconsistent too
    Rivers and Vuong is a key paper discussing methods of consistent estimation for the IV-probit set-up, which is of course closely related to yours. However, it's a bit out of date because it is written on the assumption that direct ML estimation, as in cmp, is computationally prohibitive. But computers are a lot faster now and when ML is practical, it's the most efficient.

    Comment


    • #3
      Just to follow up on David's point. You can easily see the problem in the standard probit case (Rivers and Vuong). If y2 is the endogenous explanatory variable and y2 = z*g2 + v2 is the reduced form, then you are replacing y2 with the right hand side. The fitted values are essentially z*g2 for parameters g2. But you can't ignore the addition to the error term, a1*v2, where a1 is the coefficient on y2. The effective error using the plug-in approach -- sometimes called the "forbidden regression" -- is e1 + a1*v2, where e1 has a standard normal distribution. Now the error variance is a1^2*sv2^2 + se1^2 = a1^2sv2^2 + 1 because e1 has unit variance. So the coefficients you estimate are divided by a constant greater than one. You want a1 but you get a1/sqrt(a1^2sv2^2 + 1). It is very clumsy to undo this; I show how to in Chapter 15 of my 2010 MIT Press book. As David said, MLE is relatively easy now and you should use that.

      An alternative is a control function approach to estimate the average partial effects, but there is nothing to be gained over MLE in your setup.

      Comment


      • #4
        Thank you, David and Jeff. Your replies were very helpful and instructive.

        Comment


        • #5
          Thank you, Jeff! I knew I had read about this in your textbook, but I couldn't find it, and couldn't remember the term. "Spurious regression," I was thinking... Page 236 of the 2002 edition.

          Comment


          • #6
            Hi,

            I have a similar case, in which my endogenous variable is ordinal (6-scale) and the response variable (5-scale) is also ordinal. Could I use the cmp command, with ind($cmp_oprobit $cmp_oprobit), or it generally inconsistent?

            In case this is not the appropriate methods, could you please propose me an alternative (i.e. plain 2SLS, or treating the endogenous as continuous)? I have seen some papers doing that, but I am not sure if it's appropriate method.

            Thank you in advance.

            Best regards,

            Nikos

            Comment


            • #7
              Hi, could you please help me with the comment above about the cmp command?

              Thank you in advance. Much appreciated.

              Best,

              Ilias

              Comment

              Working...
              X