Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • When dependent variable is an integer and limited ranged (13 to 47), what model should I use?

    Dear Stata Statisticians,

    I have one dependent variable that is an index constructed by 13 items. This dependent variable is financial risk tolerance, ranging from 13 to 47 and discrete. I have read two relevant posts in Stata forum:
    1) https://www.statalist.org/forums/for...tain-magnitude
    (In their case, dependent variable ranges from 0 to 40)

    2) https://www.statalist.org/forums/for...ndent-variable
    (In their case, dependent variable ranges from 0 to 10)

    Accordingly, could I consider my dependent variable as a binomial variable (e.g. -glm- with family(binomial 47)) or just go for OLS? The distribution graph can be seen in the attached file. Thank you so much for your attention.

    Best regards,
    David
    Attached Files
    Last edited by David Wong; 31 Dec 2017, 18:36.

  • #2
    David:
    welcome to this forum.
    I would also consider -tobit- (if feasible) instead of OLS.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      13 to 47 sounds like the empirical range. 13 is easier to decode as the sum of 13 values of 1s.

      Let's suppose that say 65 (13 * 5) is in the principle the upper limit. Then values of response - 13 could vary from 0 to 52. That is your reference binomial.

      Or if it's 52 as the upper limit, vary the answer accordingly.

      I can't see how tobit makes sense here. There is no censoring. Values beyond the limits are not just not observed, they are impossible.

      Comment


      • #4
        Originally posted by Nick Cox View Post
        13 to 47 sounds like the empirical range. 13 is easier to decode as the sum of 13 values of 1s.

        Let's suppose that say 65 (13 * 5) is in the principle the upper limit. Then values of response - 13 could vary from 0 to 52. That is your reference binomial.

        Or if it's 52 as the upper limit, vary the answer accordingly.

        I can't see how tobit makes sense here. There is no censoring. Values beyond the limits are not just not observed, they are impossible.
        Thank you very much for your reply, Nick. I apologize that I have not provided enough information.

        The theoretical range of this index ranges from 13 to 47. The empirical range of this index ranges from 14 to 39. This index is proposed by Grable and Lytton (1999), which include 13 items that are not Likert-scale.

        For example, item 1 is "In general, how would your best friend describe you as a risk taker?
        o A real gambler (1)
        o Willing to take risks after completing adequate research (2)
        o Cautious (3)
        o A real risk avoider (4)
        "
        The scales of items are not the same. The scale of 8 items ranges from 1 to 4, and the scale of 5 items ranges from 1 to 3. That is why this index arranges from 13 to 47.

        Can I still use -glm- with family(binomial 47) ? or other alternatives? Thank you.


        Reference:
        Grable J, Lytton RH. Financial risk tolerance revisited: the development of a risk assessment instrument☆. Financial services review. 1999 Dec 31;8(3):163-81.


        Comment


        • #5
          Thanks for the extra detail. The histogram seems to imply that the empirical range is about 14 to 38. (The option discrete would make your histogram less ambiguous.).

          Whatever you use is an approximation.

          If you want to treat the sum of your various scales as if it were like a counted fraction (conventionally a pair of integers, but equivalently a counted fraction), then, as already pointed out, you must shift the scale to start at 0. So, your distribution has support integers from 0 to 34.

          Matching marginal distribution of the outcome is not essential in modelling, but there is no gain that I can imagine here in referring your distribution to an inaccurate support.

          Comment


          • #6
            Originally posted by Nick Cox View Post
            Thanks for the extra detail. The histogram seems to imply that the empirical range is about 14 to 38. (The option discrete would make your histogram less ambiguous.).

            Whatever you use is an approximation.

            If you want to treat the sum of your various scales as if it were like a counted fraction (conventionally a pair of integers, but equivalently a counted fraction), then, as already pointed out, you must shift the scale to start at 0. So, your distribution has support integers from 0 to 34.

            Matching marginal distribution of the outcome is not essential in modelling, but there is no gain that I can imagine here in referring your distribution to an inaccurate support.
            Thank you for your kind help, Nick. I agree that shifting the scale to begin with 0 makes sense. I just have one final question. When using -glm- in my case after shifting the scale, the code will be:
            Code:
            glm y x1 ... xk, fam(bin 34) link(logit) robust
            What reference should I use when I choose binomial family and logit link? I guess it might be McCullagh and Nelder (1989) as I found it in Stata documentation pp. 878 shown as follow:

            "The canonical reference on GLM is McCullagh and Nelder (1989)... glm obtains results by IRLS, as described in McCullagh and Nelder (1989), or by maximum likelihood using Newton–Raphson..."

            Reference:
            McCullagh, P., and J. A. Nelder. 1989. Generalized Linear Models. 2nd ed. London: Chapman & Hall/CRC.
            Last edited by David Wong; 01 Jan 2018, 09:28.

            Comment


            • #7
              Dear Nick and Carlo,

              Thank you very much for helping me. I have found out what reference I should use.

              Best regards,
              David

              Comment

              Working...
              X