Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Choosing an appropriate model when dependent variable is discrete but with larger values

    Dear Statisticians,

    I have one question regarding choosing an appropriate model for cross-sectional analysis when dependent variable is discrete but with larger values. Particularly, I have one dependent variable is discrete, arranging from 15 to 50.

    If I use OLS, I may violate one assumption of it, leading to inconsistent estimation. Maybe ordered logistic model a choice? Thank you very much.

    Best regards,
    David

  • #2
    Which assumption of OLS are you worried about violating? If it is normality of residuals you are worried about, bear in mind that OLS regression is actually pretty robust to violations of that assumption. In particular, with large sample sizes the central limit theorem implies that the coefficients will have an asymptotically normal distribution around the usual OLS estimates, so everything works out anyway.

    Other alternatives to consider here are the various count-variable models such as -poisson- and -nbreg-. An ordered logistic model on an outcome running from 15 to 50, if it actually takes on all 36 of those values, sounds like a nightmare!

    Comment


    • #3
      Originally posted by Clyde Schechter View Post
      Which assumption of OLS are you worried about violating? If it is normality of residuals you are worried about, bear in mind that OLS regression is actually pretty robust to violations of that assumption. In particular, with large sample sizes the central limit theorem implies that the coefficients will have an asymptotically normal distribution around the usual OLS estimates, so everything works out anyway.

      Other alternatives to consider here are the various count-variable models such as -poisson- and -nbreg-. An ordered logistic model on an outcome running from 15 to 50, if it actually takes on all 36 of those values, sounds like a nightmare!
      Dear Clyde,

      I am sorry for delaying to reply. I agree with you that -nbreg- may be an alternative choice. I have one follow-up question. How is about truncated regression? As in my case, the dependent variable is integer, lower limitation at 13 and upper limitation at 47. Thank you so much for your reply and time.

      Best regards,
      David

      Comment


      • #4
        I hope to complement that this dependent variable is an risk averse indicator, which is a sum index constructed by 13 Likert scale questions. Therefore, this variable has a lower limitation (13) and a upper limitation (47).

        Comment


        • #5
          David:
          have you considered using -tobit-?
          Kind regards,
          Carlo
          (Stata 19.0)

          Comment


          • #6
            Clyde (on OLS) and Carlo (on tobit) underlined very interesting aspects.

            Unfortunately, you didn't present information about how your data performed under the OLS model.

            Maybe the range (15 to 50) could be taken as an issue just in terms of (undue) extrapolation, not necessarily truncation.

            Indeed, concerning blood cholesterol, there is no zero value, yet we may use this variable (without qualms) as DV in a linear regression model.

            That happens with a cornucopia of variables.
            Best regards,

            Marcos

            Comment


            • #7
              Originally posted by Marcos Almeida View Post
              Clyde (on OLS) and Carlo (on tobit) underlined very interesting aspects.

              Unfortunately, you didn't present information about how your data performed under the OLS model.

              Maybe the range (15 to 50) could be taken as an issue just in terms of (undue) extrapolation, not necessarily truncation.

              Indeed, concerning blood cholesterol, there is no zero value, yet we may use this variable (without qualms) as DV in a linear regression model.

              That happens with a cornucopia of variables.
              Code:
              Linear regression                               Number of obs     =        660
                                                              F(12, 647)        =      21.29
                                                              Prob > F          =     0.0000
                                                              R-squared         =     0.2731
                                                              Root MSE          =     4.3987
              
              --------------------------------------------------------------------------------
                             |               Robust
                         FRT |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
              ---------------+----------------------------------------------------------------
                          FL |   .1266445   .0790341     1.60   0.110    -.0285499    .2818389
                         Age |  -1.030976   .1897945    -5.43   0.000    -1.403663    -.658288
                        Male |   1.884994   .3752252     5.02   0.000     1.148187      2.6218
                      Martwo |  -.3720753   .4048684    -0.92   0.358     -1.16709    .4229394
                      No_dep |   .2196351   .1471014     1.49   0.136    -.0692187     .508489
                   Education |   .2508854   .1468261     1.71   0.088    -.0374278    .5391987
                  Employment |   1.611418   .4797209     3.36   0.001     .6694205    2.553416
              Annual_hincome |   .0367363   .2180807     0.17   0.866    -.3914952    .4649677
                   Liq_asset |   .1591343   .0529181     3.01   0.003     .0552224    .2630463
                 Fixed_asset |   .0792854   .0519164     1.53   0.127    -.0226596    .1812304
                  White_race |  -.7922464   .5196254    -1.52   0.128    -1.812602    .2281094
                   M_Expense |  -.1284775   .1891922    -0.68   0.497    -.4999823    .2430274
                       _cons |   25.51912   2.448994    10.42   0.000     20.71018    30.32805
              
              
              
              Tobit regression                                Number of obs     =        660
                                                              F(  12,    648)   =      21.68
                                                              Prob > F          =     0.0000
              Log pseudolikelihood = -1907.6034               Pseudo R2         =     0.0523
              
              --------------------------------------------------------------------------------
                             |               Robust
                         FRT |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
              ---------------+----------------------------------------------------------------
                          FL |   .1266445   .0783112     1.62   0.106    -.0271299    .2804189
                         Age |  -1.030976   .1880585    -5.48   0.000    -1.400253   -.6616979
                        Male |   1.884994    .371793     5.07   0.000     1.154929    2.615058
                      Martwo |  -.3720753   .4011651    -0.93   0.354    -1.159816    .4156653
                      No_dep |   .2196351   .1457559     1.51   0.132    -.0665758     .505846
                   Education |   .2508854   .1454831     1.72   0.085    -.0347899    .5365607
                  Employment |   1.611418    .475333     3.39   0.001     .6780395    2.544797
              Annual_hincome |   .0367363    .216086     0.17   0.865     -.387577    .4610495
                   Liq_asset |   .1591343   .0524341     3.03   0.003     .0561732    .2620955
                 Fixed_asset |   .0792854   .0514416     1.54   0.124    -.0217269    .1802977
                  White_race |  -.7922464   .5148725    -1.54   0.124    -1.803266    .2187735
                   M_Expense |  -.1284775   .1874617    -0.69   0.493    -.4965832    .2396283
                       _cons |   25.51912   2.426593    10.52   0.000     20.75418    30.28405
              ---------------+----------------------------------------------------------------
                      /sigma |   4.355196   .1125045                      4.134279    4.576114
              --------------------------------------------------------------------------------
                           0  left-censored observations
                         660     uncensored observations
                           0 right-censored observations


              Liq_asset and Fixed_asset are my main effect.

              Last edited by David Wong; 14 May 2017, 19:15.

              Comment


              • #8
                David:
                you reported the outcome Stata gave you, but we cannot see the codes of your commands.
                By the way, which upper and lower limits did you impose in tobit?
                Anyway, assuming that your codes are correct, there's no difference between -regress- and -tobit- outcomes; hence, I would go -regresss-.
                Kind regards,
                Carlo
                (Stata 19.0)

                Comment


                • #9
                  Originally posted by Carlo Lazzaro View Post
                  David:
                  you reported the outcome Stata gave you, but we cannot see the codes of your commands.
                  By the way, which upper and lower limits did you impose in tobit?
                  Anyway, assuming that your codes are correct, there's no difference between -regress- and -tobit- outcomes; hence, I would go -regresss-.
                  Thank you, Carlo. I apologize to upload the codes of my commands.

                  The codes of OLS and tobit regressions are:
                  Code:
                  reg FRT FL Age Male Martwo No_dep Education Employment Annual_hincome Liq_asset Fixed_asset White_race M_Expense,r
                  
                  
                  tobit FRT FL Age Male Martwo No_dep Education Employment Annual_hincome Liq_asset Fixed_asset White_race M_Expense,r ll(13)
                  The dependent variable, FRT, has a lower limitation (13) and a upper limitation (47).

                  Comment


                  • #10
                    If you're concerned, then do some diagnostics, for example, what do the residuals look like?
                    Code:
                    help diagnostic plots
                    help regress postestimation plots
                    Linear regression would seem to be a reasonable option when summing of as many as 13 ordered-categorical items into a Likert scale, unless they're nearly perfectly correlated.

                    Code:
                    version 14.2
                    
                    clear *
                    set more off
                    set seed 1391759
                    
                    forvalues i = 1/13 {
                        local varlist `varlist' y`i'
                    }
                    tempname Corr
                    matrix define `Corr' = J(14, 14, 0.5) + I(14) * 0.5
                    
                    quietly drawnorm `varlist' x, corr(`Corr') n(100)
                    foreach var of varlist `varlist' {
                        generate byte l`var' = 1
                        forvalues cut = 0.25(0.25)0.75 {
                            quietly replace l`var' = l`var' + 1 if `var' > invnormal(`cut')
                        }
                    }
                    
                    egen double total = rowtotal(ly?)
                    
                    histogram total
                    sleep 1500
                    
                    regress total c.x
                    predict double xb, xb
                    
                    qnorm xb
                    sleep 1500
                    
                    pnorm xb
                    sleep 1500
                    
                    rvfplot
                    
                    exit

                    Comment


                    • #11
                      Joseph Coveney gave great advice in terms of checking assumptions and postestimations.

                      That said, particularly in your case, I gather it is "predictable" that - regress - and - tobit - will provide similar result, under similar range of DV.

                      The variables and theme are not part of my field. But I wonder whether what is really puzzling you (instead of finding the "best model") is the fact that fixed assets was non-significant, contrary to liquid assets. However, "adjusting" for liquid assets may have turned fixed assets non-significant, I fear say.
                      Best regards,

                      Marcos

                      Comment


                      • #12
                        David:
                        thanks for providing further details.
                        As an aside, yp others' helpful advice, comparing your code for to your outcome from -tobit-, it does not seem that any left or right-censored observations has been considered with your data. If that result is in line with what you're after, there's no point in using -tobit- and you can go -regress- with no further concerns.
                        Kind regards,
                        Carlo
                        (Stata 19.0)

                        Comment


                        • #13
                          My prior prejudice is that Tobit is oversold in this context. It's hard to see that in principle that linearity is consistent with bounded responses.

                          What are the bounds of the outcome in principle (not in practice)? I guess at 13 and 65.

                          The values seem curiously labile from post to post in this thread? I see 13, 15 and 47, 50.

                          If so, then I'd rescale to (scale - 13) / 52 (i.e. bounds [0, 1]) and then apply logit link and robust standard errors.

                          Comment


                          • #14
                            Originally posted by Carlo Lazzaro View Post
                            David:
                            thanks for providing further details.
                            As an aside, yp others' helpful advice, comparing your code for to your outcome from -tobit-, it does not seem that any left or right-censored observations has been considered with your data. If that result is in line with what you're after, there's no point in using -tobit- and you can go -regress- with no further concerns.
                            Thank you so much for your reply, which is very clear now to my next procedures.

                            Comment


                            • #15
                              Originally posted by Marcos Almeida View Post
                              Joseph Coveney gave great advice in terms of checking assumptions and postestimations.

                              That said, particularly in your case, I gather it is "predictable" that - regress - and - tobit - will provide similar result, under similar range of DV.

                              The variables and theme are not part of my field. But I wonder whether what is really puzzling you (instead of finding the "best model") is the fact that fixed assets was non-significant, contrary to liquid assets. However, "adjusting" for liquid assets may have turned fixed assets non-significant, I fear say.
                              Dear Marcos,

                              Thank you very much for your reply. Since the estimator of liquid assets is significant, I can still tell some story in my paper. In the same time, I can also report that fixed asset can not explain the variance of financial risk tolerance, as a negative finding. The reason why I made a post here, is hoping to find out a way to do a robustness check on the results which got from OLS. I thought that using alternative regressions may be a first choice, although I am very fresh regarding the choices for robustness check.

                              Comment

                              Working...
                              X