Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Modification of the first stage code in xtivreg2

    Hi,

    I have a model that I estimate using xtivreg2. The first stage of the model is a linear probability model (LPM). The predictions of the LPM can be outside the unit interval, and indeed, when I manually execute the first stage and get the predictions, I observe that some predictions are just below 0 where the corresponding independent variables take values from the higher ends of their distributions, which makes perfect sense. I could proceed in four ways: (i) I could replace, in an arbitrary manner, the negative predictions with very small values close to 0, such as with 0.001. (ii) I could substract a sufficiently small amount from the values of the problematic independent variables so that they do not lead to predictions outside the unit interval. (iii) I could drop the observations that lead to negative predictions in the first stage and carry on with xtivreg2. (iv) I could switch to a probit model - but, for some reason, I prefer not to carry on with the probit model.

    My first question: which of the first of these three ways of dealing with the outside the unit interval problem would be the most leigitimate? I prefer dropping the problematic observations (some 180 observations, while I have more than 50,000 observations in my sample, although I also observe that a couple of statistics are a little sensitive to dropping the 180 observations) since I find the other ways difficult to justify. Is there a most preferred way of dealing with the problem?

    My second question: if I wanted to follow the first way, could I easily step in the respective line of code in the .ado file of xtivreg2 and replace the problematic predictions and let xtivreg2 run as usual? Or would it be too difficult to deal with the .ado file of xtivreg2? I prefer to rely on xtivreg2 for the robust statistics it produces after the estimation.

    Tunga

  • #2
    Tunga,

    This comes up regularly on Statalist. Short answer:

    - You omitted option (v), "ignore the problem", because in fact it isn't a problem. You will get consistent estimates of the parameter of interest with a linear first stage. The reason is that IV is a limited information (single equation) estimation method and the first-stage doesn't have to have a structural interpretation. If you have a look at a standard proof of consistency of IV, you'll see that all you need is that the excluded instrument is correlated with the endogenous regressor. You don't have to have modelled the structural relationship correctly - correlation is enough. (Plus all the other requirements, of course.)

    - If you want to get more efficient estimations by making some structural assumptions about the first stage, you have some options open. None of them is in your (i)-(iv) list. One is described in this Statalist post:

    http://www.stata.com/statalist/archi.../msg00337.html

    but you will need to think a bit about how to adapt it to the panel case, and whether the extra assumptions needed are ones you want to make.

    HTH,
    Mark

    Comment


    • #3
      Hi Mark,

      Thanks for this quick and informative reply. I will consider mentioning in my paper that in some cases the predictions of the first stage regression are outside the unit interval but no adjustment is carried out since the 2SLS estimator remains consistent. I have already checked the discussion at the link you provided, but I have six instruments in total and I think I am not able to benefit from what is proposed in this discussion.

      Tunga

      Comment

      Working...
      X