Hello!
I would be glad to hear your opinion on this.
My dependent variable (y_var) measures the number that a certain event is showed for each observation, ranging from 0 up to 11. As you can see below, 54% of my sample has a value of 0 y_var:
My independent variable of interest is a categorical variable, that counts the number of correct answers in a certain test:
Naturally, since my dependent variable counts the number of times that an individual exhibits the event in the data, I was thinking to explore this relationship using a Negative Binomial model or a Zero-Inflated model.
However, I came up with an idea that might allow me to explore this relationship using an interval regression as well. I hope to hear your opinion on this:
I have defined a new dependent variable with 3 categories; category 1, all of those who showed a number of events equal to 11; category 2, all of those who showed a number of events between 1 and 10; and category 3, all of those with a number of events equal to 0:
Since I know the cut-off values (i.e. 1, 2, 3, … 11), I am able to create the upper (y2) and lower (y1) limit of each of these three categories:
Finally, I have set up a regression model, and estimated it through an interval regression:
If my exercise is correct, I would be able to interpret directly the coefficients from the regression output; for example, having 3 correct answers in the crt test, on average, would decrease the number of events exhibited by 2.2.
I am wondering if it makes any sense the exercise I am proposing to use my dependent variable as an ordered variable? Conditional on that, it would be reasonable to compare my interval regression results with the results that I could get estimating a model for count data?
Any further suggestion is very welcome!
Many thanks!
I would be glad to hear your opinion on this.
My dependent variable (y_var) measures the number that a certain event is showed for each observation, ranging from 0 up to 11. As you can see below, 54% of my sample has a value of 0 y_var:
Code:
tab y_var, m y_var | Freq. Percent Cum. ------------+----------------------------------- 0 | 111 53.88 53.88 1 | 5 2.43 56.31 2 | 31 15.05 71.36 3 | 11 5.34 76.70 4 | 7 3.40 80.10 5 | 18 8.74 88.83 6 | 8 3.88 92.72 7 | 3 1.46 94.17 8 | 2 0.97 95.15 9 | 5 2.43 97.57 10 | 2 0.97 98.54 11 | 3 1.46 100.00 ------------+----------------------------------- Total | 206 100.00
Code:
tab crt, m nº of | answers | Freq. Percent Cum. ------------+----------------------------------- 0 | 56 27.18 27.18 1 | 40 19.42 46.60 2 | 46 22.33 68.93 3 | 64 31.07 100.00 ------------+----------------------------------- Total | 206 100.00
However, I came up with an idea that might allow me to explore this relationship using an interval regression as well. I hope to hear your opinion on this:
I have defined a new dependent variable with 3 categories; category 1, all of those who showed a number of events equal to 11; category 2, all of those who showed a number of events between 1 and 10; and category 3, all of those with a number of events equal to 0:
Code:
tab new_yvar, m new_yvar | Freq. Percent Cum. ------------+----------------------------------- 1 | 3 1.46 1.46 2 | 92 44.66 46.12 3 | 111 53.88 100.00 ------------+----------------------------------- Total | 206 100.00
Code:
g y1 = . g y2 = . replace y1 = . if new_yvar == 3 replace y2 = 0 if new_yvar == 3 replace y1 = 1 if new_yvar == 2 replace y2 = 10 if new_yvar == 2 replace y1 = 11 if new_yvar == 1 replace y2 = . if new_yvar == 1 sum y1 y2 Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- y1 | 95 1.315789 1.75804 1 11 y2 | 203 4.53202 4.990358 0 10
Code:
intreg y1 y2 i.crt, robust nolog Interval regression Number of obs = 206 Uncensored = 0 Left-censored = 111 Right-censored = 3 Interval-cens. = 92 Wald chi2(3) = 5.21 Log pseudolikelihood = -171.53029 Prob > chi2 = 0.1569 ------------------------------------------------------------------------------ | Robust | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- crt | 0 | 0 (base) 1 | -1.862053 1.273837 -1.46 0.144 -4.358727 .6346216 2 | -2.344412 1.287257 -1.82 0.069 -4.867389 .178565 3 | -2.226667 1.153804 -1.93 0.054 -4.488082 .034748 | _cons | 1.279063 .8225653 1.55 0.120 -.3331352 2.891262 -------------+---------------------------------------------------------------- /lnsigma | 1.698164 .0856653 19.82 0.000 1.530263 1.866065 -------------+---------------------------------------------------------------- sigma | 5.463908 .4680675 4.619393 6.462817 ------------------------------------------------------------------------------
I am wondering if it makes any sense the exercise I am proposing to use my dependent variable as an ordered variable? Conditional on that, it would be reasonable to compare my interval regression results with the results that I could get estimating a model for count data?
Any further suggestion is very welcome!
Many thanks!
Comment