Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Adequate variable transformation

    Hi, everyone!
    I kindly ask for advice regarding whether and how I should transform the regressor x1 given these data. I am estimating a model with panel data using the xtreg command.It is evident that the variable is highly right-skewed, and since some values are equal to zero, I cannot simply take the logarithm. This is the extent of my knowledge, as I am not an econometrician.
    Thank you for your advice!

    variable: x1
    observations: 584
    mean: 0.1398879
    std. dev.: 1.84801
    min: 0
    max: 39.01594
    Last edited by Jovana Ju; 14 Jun 2025, 09:55.

  • #2
    First of all, are you sure you need to transform. Just because a variable has a highly skew distribution, does not necessarily mean that you must transform it.

    Assuming you really need to do something about it, while there are other transformations available that tolerate zeroes or even negatives, such as cube root and asinh(). But these raise problems of interpretation in the results because it is not straightforward to back-transform the results of a regression.

    What is often the best way to approach this is to use -xtpoisson- with robust vce instead. It is, in a sense, the equivalent of a log transformation, but it tolerates zeroes and even negatives. Instead of estimating E(log(y) | x) it estimates log(E(y)|x). Even when y itself may be 0 or negative, if it is reasonable to assume that E(y) will be positive, this works.

    Comment


    • #3
      I was considering applying a transformation because those few extremely large values of X are affecting the results I'm getting. I'm not sure if I'm right.
      Last edited by Jovana Ju; 14 Jun 2025, 12:23.

      Comment


      • #4
        There is no requirement that regressors be even symmetrically distributed: such a requirement would rule out most possible predictors that are (0, 1) indicators.

        The practical problem for us is that #1 gives us no idea whether you're better off with a transformed version of x1 -- which is just something you need to try.

        The central issue I see as functional form and whether say the root or cube root of that regressor works better within y = Xb.

        The question transcends econometrics, so that even non-econometricians such as Clyde Schechter and myself. can feel bold enough to comment.

        Comment


        • #5
          Square root and cube root are what I would try.

          You should please show the results of

          Code:
          su x1, detail
          and also

          Code:
          quantile x1
          and post the resulting graph as a .png
          Last edited by Nick Cox; 14 Jun 2025, 12:44.

          Comment


          • #6
            Thank you so much!


            Attached Files

            Comment


            • #7
              Code:
              su x1, detail
              
                                           x1
              -------------------------------------------------------------
                    Percentiles      Smallest
               1%            0              0
               5%            0              0
              10%     2.20e-07              0       Obs                 584
              25%     .0000131              0       Sum of wgt.         584
              
              50%     .0002568                      Mean           .1398879
                                      Largest       Std. dev.       1.84801
              75%     .0027292       6.286705
              90%     .0096147       7.679822       Variance        3.41514
              95%     .0143426       18.85075       Skewness       17.96726
              99%     2.890973       39.01594       Kurtosis        355.293

              Comment


              • #8
                Code:
                * Example generated by -dataex-. For more info, type help dataex
                clear
                input float p double x1
                        .01         0
                        .05         0
                         .1 2.200e-07
                        .25  .0000131
                         .5  .0002568
                        .75  .0027292
                         .9  .0096147
                        .95  .0143426
                        .99  2.890973
                end
                
                gen sqrt_x1 = sqrt(x1)
                
                gen curt_x1 = x1^(1/3)
                
                * (p75 - 2 p50 + p25) / (p75 - p25)
                
                foreach v of var x1-curt_x1 {
                    di "`v'{col 12}" %04.3f  (`v'[6] - 2 * `v'[5] + `v'[4]) / (`v'[6] - `v'[4])
                }
                
                * (p90 - 2 p50 + p10) / (p90 - p10)
                
                foreach v of var x1-curt_x1 {
                    di "`v'{col 12}" %04.3f  (`v'[7] - 2 * `v'[5] + `v'[3]) / (`v'[7] - `v'[3])
                }
                Code:
                * (p75 - 2 p50 + p25) / (p75 - p25)
                
                x1         0.821
                sqrt_x1    0.490
                curt_x1    0.312
                
                
                * (p90 - 2 p50 + p10) / (p90 - p10) 
                
                x1         0.947
                sqrt_x1    0.681
                curt_x1    0.443
                Last edited by Nick Cox; 15 Jun 2025, 02:07.

                Comment


                • #9
                  Thanks for the output. See Section 7 of https://journals.sagepub.com/doi/pdf...6867X211063415 for remarks on the skewness measures used in the previous post. The goal is getting closer to linearity. not symmetry, and square root and even more cube root will reduce skewness and pull in outliers.

                  See also Section 3.3 of https://journals.sagepub.com/doi/pdf...6867X241276114 and its references for various remarks on transformations.

                  I would compare your model fits using

                  the original version of the regressor

                  its square root

                  its cube root.

                  I would not mess about with (e.g.) log(x1 + 1).

                  What would be the interpretation here? Well, what would be the interpretation if you had been able to use the logarithm? Mostly that it seems to be needed as getting closer to the relationship. People in various fields prefer a story in terms of some theory, usually some rationalization thought up once it's seen that a transformation helps.

                  Comment


                  • #10
                    One of the best ways to think about a transformation is to draw a graph. Here is a plot of square root and cube root over the range of your regressor with extreme points highlighted. You can get a sense of how far outliers are pulled in, relatively.

                    Code:
                    * Example generated by -dataex-. For more info, type help dataex
                    clear
                    input float x1
                           0
                    6.286705
                    7.679822
                    18.85075
                    39.01594
                    end
                    
                    gen sqrt_x1 = sqrt(x1)
                    
                    gen curt_x1 = x1^(1/3)
                    
                    levelsof x1 if x1 > 0, local(x)
                    
                    levelsof sqrt_x1 if x1 > 0, local(T1)
                    
                    levelsof curt_x1 if x1 > 0, local(T2)
                    
                    twoway function sqrt(x), range(x1) ///
                    || scatter sqrt_x1 x1, ytitle(square root) xtitle(original) xli(`x') yli(`T1') legend(off) name(G1, replace)
                    
                    twoway function x^(1/3), range(x1) ///
                    || scatter curt_x1 x1, ytitle(cube root) xtitle(original) xli(`x') yli(`T2') legend(off) yla(0/3) name(G2, replace)
                    
                    graph combine G1 G2

                    Click image for larger version

Name:	trans.png
Views:	1
Size:	58.8 KB
ID:	1778835

                    Comment


                    • #11
                      Nick, Thank you very much for taking the time to look into my issue! I will try a transformation using the square root and cube root. And of course, I will now carefully go through the texts you referred me to.

                      Comment

                      Working...
                      X