Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to interpret results of regression with Box Cox transformed Y

    Hi everyone,

    Could please share your opinion on how should I interpret and explain the results of the following multiple regression analysis.

    Dependent variable: bcY which is Box-Cox transformed data. Original data related to the dependent variable is a 7-points Likert scale that in this study it is considered as a continuous variable.

    Independent variables:
    Cntrd-IV(1) to (7) which are mean-centered values
    Group: belongingness to group 1 or group 2
    Cntrd-IV(1) to (6) × Group: interaction terms
    bcY Coef. St.Err. t-value p-value [95% Conf Interval] Sig
    Cntrd-IV(1) -.015 .006 -2.32 .021 -.027 -.002 **
    Cntrd-IV(2) .031 .008 3.93 0 .016 .047 ***
    Cntrd-IV(3) .003 .008 0.34 .734 -.013 .019
    Cntrd-IV(4) .015 .009 1.67 .096 -.003 .034 *
    Cntrd-IV(5) -.01 .006 -1.60 .109 -.023 .002
    Cntrd-IV(6) 0 .008 0.01 .994 -.017 .017
    Cntrd-IV(7) -.005 .006 -0.77 .443 -.016 .007
    : base Group1 0 . . . . .
    Group2 .166 .173 0.96 .336 -.173 .506
    Cntrd-IV(1) × Group .011 .011 1.00 .32 -.011 .034
    Cntrd-IV(2) × Group -.014 .013 -1.15 .25 -.039 .01
    Cntrd-IV(3) × Group .023 .013 1.74 .082 -.003 .048 *
    Cntrd-IV(4) × Group -.012 .013 -0.88 .377 -.037 .014
    Cntrd-IV(5) × Group -.004 .01 -0.37 .708 -.023 .016
    Cntrd-IV(6) × Group -.016 .012 -1.33 .185 -.04 .008
    Constant 3.538 .111 31.77 0 3.32 3.757 ***
    Mean dependent var 3.644 SD dependent var 1.983
    R-squared 0.141 Number of obs 508
    F-test 6.610 Prob > F 0.000
    Akaike crit. (AIC) 2089.367 Bayesian crit. (BIC) 2152.824
    *** p<.01, ** p<.05, * p<.1

    Looking forward to hearing your opinions.
    Thanks

  • #2
    Is this based on Stata output? It's more in flavour what I might expect from SPSS and doesn't even give what transform Box-Cox comes up with suggesting.

    Comment


    • #3
      Is this for your own research or are you critiquing someone else’s? If Y is on a Likert scale then I would use ordered logit or probit. Even with a continuous variable the Box-Cox transformation is difficult to interpret. I have a 1989 article in the International Economic Review article that discusses the problem and offers alternatives. But in your case, an ordered model seems appropriate.

      Comment


      • #4
        Maybe this article: Wooldridge, Jeffrey M., “Some Alternatives to the Box-Cox Regression Model,” International Economic Review 33, 935-955, November 1992.

        Comment


        • #5
          There would be widespread reluctance to transform a Likert item. I would add that Box-Cox is about powers and logarithms, which all seem wrong any way.

          If any transform makes sense it will might be one that pulls out the tails relative to the middle. which Box-Cox doesn't (usually) do. But much depends on the details; A Likert item might elicit strong answers and a bimodal distribution -- https://www.creativereview.co.uk/you...it-or-hate-it/ -- and such a distribution is unlikely to benefit from a transformation either.

          Comment


          • #6
            Originally posted by Chen Samulsion View Post
            Maybe this article: Wooldridge, Jeffrey M., “Some Alternatives to the Box-Cox Regression Model,” International Economic Review 33, 935-955, November 1992.
            Ooops. Thanks! I was thinking of the year of the working paper.

            Comment


            • #7
              Originally posted by Nick Cox View Post
              Is this based on Stata output? It's more in flavour what I might expect from SPSS and doesn't even give what transform Box-Cox comes up with suggesting.
              Yes, it is based on Stata output. I used asdoc to transfer the output to word doc.

              As a separate note, the following is the output of the Box-Cox transformation on Y.

              . bcskew0 bcy = y, level(95)

              Transform | L [95% conf. interval] Skewness
              -----------------+--------------------------------------------------
              (y^L-1)/L | 1.237275 .999272 1.513921 -.0000147
              (96 missing values generated)

              Comment


              • #8
                Originally posted by Jeff Wooldridge View Post
                Is this for your own research or are you critiquing someone else’s? If Y is on a Likert scale then I would use ordered logit or probit. Even with a continuous variable the Box-Cox transformation is difficult to interpret. I have a 1989 article in the International Economic Review article that discusses the problem and offers alternatives. But in your case, an ordered model seems appropriate.
                Yes, it is a supplementary part of my doctoral thesis.
                The objective is to explore the association between the experience of beautiful (Y) and features of a given city, belongingness to a group of tourists vs residents, and the interaction between belongingness to a group and features of the city.
                To clarify more about the Y, we asked participants to measure how frequently they have experienced the beautiful in the given city (Never=0, Very Rarely=1, Rarely=2, Occasionally=3, Often=4, Frequently=5, Always=6)

                Originally, I had analyzed the data using gologit2, however, my supervisor INSISTING that the Y is a continuous variable she didn’t accept my analyses. So, applying multiple linear regressing (command reg) the assumption of normality of residuals of the model was not held. Thereby, my advisor wanted me to transform data using Box-Cox transformation.

                The critical issue, now, is that although I know ordered model is appropriate for my study but I don’t have any chance to graduate unless to interpret the results of the current linear regression analysis using box-cox transformed Y regressed on mean-centered Xs.

                Any recommendation, please?
                Thanks
                Last edited by Hakimeh Nasiri; 24 Oct 2021, 20:00.

                Comment


                • #9
                  Originally posted by Nick Cox View Post
                  There would be widespread reluctance to transform a Likert item. I would add that Box-Cox is about powers and logarithms, which all seem wrong any way.

                  If any transform makes sense it will might be one that pulls out the tails relative to the middle. which Box-Cox doesn't (usually) do. But much depends on the details; A Likert item might elicit strong answers and a bimodal distribution -- https://www.creativereview.co.uk/you...it-or-hate-it/ -- and such a distribution is unlikely to benefit from a transformation either.
                  I understand. In the current situation, please lets assume that transforming the Y using Box-Cox is appropriate. So how we should interpret the regression output?

                  Thanks

                  Comment


                  • #10
                    Originally posted by Chen Samulsion View Post
                    Maybe this article: Wooldridge, Jeffrey M., “Some Alternatives to the Box-Cox Regression Model,” International Economic Review 33, 935-955, November 1992.
                    Thanks though

                    Comment


                    • #11
                      #7 #8 #9 Your supervisor can INSIST away but your outcome variable isn't convincing as a continuous variable to me. It's discrete and even the use of successive integer scores is a convention. It's on ordinal scale in anybody's book. That does not stop people applying linear regression to such outcomes and sometimes that works well enough if roughly, but you want -- or rather are being told -- to go further in the same dubious direction. As already flagged there are many models specifically designed for such ordinal outcomes.

                      You have problems on several levels, unfortunately.

                      1. bcskew0 insists on its argument being strictly positive. This is explicit in its help So, your zeros are ignored. Note the message you got about missing values. Surely, your model is no use if it excludes the zeros. Perhaps your supervisor just hasn't thought this through. Conversely if you insist that Never means zero, you're stuck in that position absolutely.

                      2. You could add 1 to make scores 1 to 7 but that's another level of arbitrariness. Equally, what except convention stops you thinking that the scale runs from -3 to 3? There is something fishy about applying a transformation that can only be computed at all with entirely arbitrary re-scaling.

                      3. Even with the game you're playing a power of 1.237 comes with a confidence interval that (just) includes 1, indicating that the results are consistent with no transformation at all. Personally I am happy with powers like 0.5 and 2 when there are independent grounds for using them (including fitting a quadratic in predictors, which includes a transformation too) but a power of 1.237 is pretty hard to defend.

                      4. The assumption of normally distributed residuals (strictly, errors) is the least important assumption of linear regression. I need only mention Jeff's introductory text as one of many places explaining that.

                      You are in a difficult situation, but so are we. I can't encourage you to proceed with what makes essentially no sense statistically.
                      Last edited by Nick Cox; 25 Oct 2021, 02:35.

                      Comment

                      Working...
                      X