Hello!
I would be glad to hear your opinion on this.
My dependent variable (y_var) measures the number that a certain event is showed for each observation, ranging from 0 up to 11. As you can see below, 54% of my sample has a value of 0 y_var:
My independent variable of interest is a categorical variable, that counts the number of correct answers in a certain test:
Naturally, since my dependent variable counts the number of times that an individual exhibits the event in the data, I was thinking to explore this relationship using a Negative Binomial model or a Zero-Inflated model.
However, I came up with an idea that might allow me to explore this relationship using an interval regression as well. I hope to hear your opinion on this:
I have defined a new dependent variable with 3 categories; category 1, all of those who showed a number of events equal to 11; category 2, all of those who showed a number of events between 1 and 10; and category 3, all of those with a number of events equal to 0:
Since I know the cut-off values (i.e. 1, 2, 3, … 11), I am able to create the upper (y2) and lower (y1) limit of each of these three categories:
Finally, I have set up a regression model, and estimated it through an interval regression:
If my exercise is correct, I would be able to interpret directly the coefficients from the regression output; for example, having 3 correct answers in the crt test, on average, would decrease the number of events exhibited by 2.2.
I am wondering if it makes any sense the exercise I am proposing to use my dependent variable as an ordered variable? Conditional on that, it would be reasonable to compare my interval regression results with the results that I could get estimating a model for count data?
Any further suggestion is very welcome!
Many thanks!
I would be glad to hear your opinion on this.
My dependent variable (y_var) measures the number that a certain event is showed for each observation, ranging from 0 up to 11. As you can see below, 54% of my sample has a value of 0 y_var:
Code:
tab y_var, m
y_var | Freq. Percent Cum.
------------+-----------------------------------
0 | 111 53.88 53.88
1 | 5 2.43 56.31
2 | 31 15.05 71.36
3 | 11 5.34 76.70
4 | 7 3.40 80.10
5 | 18 8.74 88.83
6 | 8 3.88 92.72
7 | 3 1.46 94.17
8 | 2 0.97 95.15
9 | 5 2.43 97.57
10 | 2 0.97 98.54
11 | 3 1.46 100.00
------------+-----------------------------------
Total | 206 100.00
Code:
tab crt, m
nº of |
answers | Freq. Percent Cum.
------------+-----------------------------------
0 | 56 27.18 27.18
1 | 40 19.42 46.60
2 | 46 22.33 68.93
3 | 64 31.07 100.00
------------+-----------------------------------
Total | 206 100.00
However, I came up with an idea that might allow me to explore this relationship using an interval regression as well. I hope to hear your opinion on this:
I have defined a new dependent variable with 3 categories; category 1, all of those who showed a number of events equal to 11; category 2, all of those who showed a number of events between 1 and 10; and category 3, all of those with a number of events equal to 0:
Code:
tab new_yvar, m
new_yvar | Freq. Percent Cum.
------------+-----------------------------------
1 | 3 1.46 1.46
2 | 92 44.66 46.12
3 | 111 53.88 100.00
------------+-----------------------------------
Total | 206 100.00
Code:
g y1 = .
g y2 = .
replace y1 = . if new_yvar == 3
replace y2 = 0 if new_yvar == 3
replace y1 = 1 if new_yvar == 2
replace y2 = 10 if new_yvar == 2
replace y1 = 11 if new_yvar == 1
replace y2 = . if new_yvar == 1
sum y1 y2
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
y1 | 95 1.315789 1.75804 1 11
y2 | 203 4.53202 4.990358 0 10
Code:
intreg y1 y2 i.crt, robust nolog
Interval regression Number of obs = 206
Uncensored = 0
Left-censored = 111
Right-censored = 3
Interval-cens. = 92
Wald chi2(3) = 5.21
Log pseudolikelihood = -171.53029 Prob > chi2 = 0.1569
------------------------------------------------------------------------------
| Robust
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
crt |
0 | 0 (base)
1 | -1.862053 1.273837 -1.46 0.144 -4.358727 .6346216
2 | -2.344412 1.287257 -1.82 0.069 -4.867389 .178565
3 | -2.226667 1.153804 -1.93 0.054 -4.488082 .034748
|
_cons | 1.279063 .8225653 1.55 0.120 -.3331352 2.891262
-------------+----------------------------------------------------------------
/lnsigma | 1.698164 .0856653 19.82 0.000 1.530263 1.866065
-------------+----------------------------------------------------------------
sigma | 5.463908 .4680675 4.619393 6.462817
------------------------------------------------------------------------------
I am wondering if it makes any sense the exercise I am proposing to use my dependent variable as an ordered variable? Conditional on that, it would be reasonable to compare my interval regression results with the results that I could get estimating a model for count data?
Any further suggestion is very welcome!
Many thanks!

Comment