When dependent variable is an integer and limited ranged (13 to 47), what model should I use?

David Wong

Join Date: Mar 2017

Posts: 30
#1

When dependent variable is an integer and limited ranged (13 to 47), what model should I use?

31 Dec 2017, 18:34

Dear Stata Statisticians,

I have one dependent variable that is an index constructed by 13 items. This dependent variable is financial risk tolerance, ranging from 13 to 47 and discrete. I have read two relevant posts in Stata forum:
1) https://www.statalist.org/forums/for...tain-magnitude
(In their case, dependent variable ranges from 0 to 40)

2) https://www.statalist.org/forums/for...ndent-variable
(In their case, dependent variable ranges from 0 to 10)

Accordingly, could I consider my dependent variable as a binomial variable (e.g. -glm- with family(binomial 47)) or just go for OLS? The distribution graph can be seen in the attached file. Thank you so much for your attention.

Best regards,
David
Attached Files

Last edited by David Wong; 31 Dec 2017, 18:36.
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#2

01 Jan 2018, 01:17

David:
welcome to this forum.
I would also consider -tobit- (if feasible) instead of OLS.

Kind regards,
Carlo
(Stata 19.0)
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35726
#3

01 Jan 2018, 01:44

13 to 47 sounds like the empirical range. 13 is easier to decode as the sum of 13 values of 1s.

Let's suppose that say 65 (13 * 5) is in the principle the upper limit. Then values of response - 13 could vary from 0 to 52. That is your reference binomial.

Or if it's 52 as the upper limit, vary the answer accordingly.

I can't see how tobit makes sense here. There is no censoring. Values beyond the limits are not just not observed, they are impossible.
Comment
David Wong

Join Date: Mar 2017

Posts: 30
#4

01 Jan 2018, 06:24

Originally posted by Nick Cox View Post

13 to 47 sounds like the empirical range. 13 is easier to decode as the sum of 13 values of 1s.

Let's suppose that say 65 (13 * 5) is in the principle the upper limit. Then values of response - 13 could vary from 0 to 52. That is your reference binomial.

Or if it's 52 as the upper limit, vary the answer accordingly.

I can't see how tobit makes sense here. There is no censoring. Values beyond the limits are not just not observed, they are impossible.

Thank you very much for your reply, Nick. I apologize that I have not provided enough information.

The theoretical range of this index ranges from 13 to 47. The empirical range of this index ranges from 14 to 39. This index is proposed by Grable and Lytton (1999), which include 13 items that are not Likert-scale.

For example, item 1 is "In general, how would your best friend describe you as a risk taker?
o A real gambler (1)
o Willing to take risks after completing adequate research (2)
o Cautious (3)
o A real risk avoider (4)
"
The scales of items are not the same. The scale of 8 items ranges from 1 to 4, and the scale of 5 items ranges from 1 to 3. That is why this index arranges from 13 to 47.

Can I still use -glm- with family(binomial 47) ? or other alternatives? Thank you.

Reference:
Grable J, Lytton RH. Financial risk tolerance revisited: the development of a risk assessment instrument☆. Financial services review. 1999 Dec 31;8(3):163-81.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35726
#5

01 Jan 2018, 06:55

Thanks for the extra detail. The histogram seems to imply that the empirical range is about 14 to 38. (The option discrete would make your histogram less ambiguous.).

Whatever you use is an approximation.

If you want to treat the sum of your various scales as if it were like a counted fraction (conventionally a pair of integers, but equivalently a counted fraction), then, as already pointed out, you must shift the scale to start at 0. So, your distribution has support integers from 0 to 34.

Matching marginal distribution of the outcome is not essential in modelling, but there is no gain that I can imagine here in referring your distribution to an inaccurate support.
Comment
David Wong

Join Date: Mar 2017

Posts: 30
#6

01 Jan 2018, 09:25

Originally posted by Nick Cox View Post

Thanks for the extra detail. The histogram seems to imply that the empirical range is about 14 to 38. (The option discrete would make your histogram less ambiguous.).

Whatever you use is an approximation.

If you want to treat the sum of your various scales as if it were like a counted fraction (conventionally a pair of integers, but equivalently a counted fraction), then, as already pointed out, you must shift the scale to start at 0. So, your distribution has support integers from 0 to 34.

Matching marginal distribution of the outcome is not essential in modelling, but there is no gain that I can imagine here in referring your distribution to an inaccurate support.

Thank you for your kind help, Nick. I agree that shifting the scale to begin with 0 makes sense. I just have one final question. When using -glm- in my case after shifting the scale, the code will be:

Code:

glm y x1 ... xk, fam(bin 34) link(logit) robust

What reference should I use when I choose binomial family and logit link? I guess it might be McCullagh and Nelder (1989) as I found it in Stata documentation pp. 878 shown as follow:

"The canonical reference on GLM is McCullagh and Nelder (1989)... glm obtains results by IRLS, as described in McCullagh and Nelder (1989), or by maximum likelihood using Newton–Raphson..."

Reference:
McCullagh, P., and J. A. Nelder. 1989. Generalized Linear Models. 2nd ed. London: Chapman & Hall/CRC.

Last edited by David Wong; 01 Jan 2018, 09:28.
Comment
David Wong

Join Date: Mar 2017

Posts: 30
#7

04 Jan 2018, 08:28

Dear Nick and Carlo,

Thank you very much for helping me. I have found out what reference I should use.

Best regards,
David
Comment

Announcement

When dependent variable is an integer and limited ranged (13 to 47), what model should I use?

Comment

Comment

Comment

Comment

Comment

Comment