Hi,
I have a model that I estimate using xtivreg2. The first stage of the model is a linear probability model (LPM). The predictions of the LPM can be outside the unit interval, and indeed, when I manually execute the first stage and get the predictions, I observe that some predictions are just below 0 where the corresponding independent variables take values from the higher ends of their distributions, which makes perfect sense. I could proceed in four ways: (i) I could replace, in an arbitrary manner, the negative predictions with very small values close to 0, such as with 0.001. (ii) I could substract a sufficiently small amount from the values of the problematic independent variables so that they do not lead to predictions outside the unit interval. (iii) I could drop the observations that lead to negative predictions in the first stage and carry on with xtivreg2. (iv) I could switch to a probit model - but, for some reason, I prefer not to carry on with the probit model.
My first question: which of the first of these three ways of dealing with the outside the unit interval problem would be the most leigitimate? I prefer dropping the problematic observations (some 180 observations, while I have more than 50,000 observations in my sample, although I also observe that a couple of statistics are a little sensitive to dropping the 180 observations) since I find the other ways difficult to justify. Is there a most preferred way of dealing with the problem?
My second question: if I wanted to follow the first way, could I easily step in the respective line of code in the .ado file of xtivreg2 and replace the problematic predictions and let xtivreg2 run as usual? Or would it be too difficult to deal with the .ado file of xtivreg2? I prefer to rely on xtivreg2 for the robust statistics it produces after the estimation.
Tunga
I have a model that I estimate using xtivreg2. The first stage of the model is a linear probability model (LPM). The predictions of the LPM can be outside the unit interval, and indeed, when I manually execute the first stage and get the predictions, I observe that some predictions are just below 0 where the corresponding independent variables take values from the higher ends of their distributions, which makes perfect sense. I could proceed in four ways: (i) I could replace, in an arbitrary manner, the negative predictions with very small values close to 0, such as with 0.001. (ii) I could substract a sufficiently small amount from the values of the problematic independent variables so that they do not lead to predictions outside the unit interval. (iii) I could drop the observations that lead to negative predictions in the first stage and carry on with xtivreg2. (iv) I could switch to a probit model - but, for some reason, I prefer not to carry on with the probit model.
My first question: which of the first of these three ways of dealing with the outside the unit interval problem would be the most leigitimate? I prefer dropping the problematic observations (some 180 observations, while I have more than 50,000 observations in my sample, although I also observe that a couple of statistics are a little sensitive to dropping the 180 observations) since I find the other ways difficult to justify. Is there a most preferred way of dealing with the problem?
My second question: if I wanted to follow the first way, could I easily step in the respective line of code in the .ado file of xtivreg2 and replace the problematic predictions and let xtivreg2 run as usual? Or would it be too difficult to deal with the .ado file of xtivreg2? I prefer to rely on xtivreg2 for the robust statistics it produces after the estimation.
Tunga

Comment