Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Highly Left Skewed variable

    Hello,

    I wanted to run a panel logistic regression with dependent variable Y (0, 1) and an explanatory variable X with other control variables.

    Where, X is the index variable scaled between 0 and 1 and it is highly left-skewed (most of the observations near to 1, see the Histogram below).

    Is it fine if I include the original X variable in the regression without transformation? or Do I need to use any other regression model (ordered logit or probit)?


    Thanks for your consideration.

    Click image for larger version

Name:	X.JPG
Views:	1
Size:	39.2 KB
ID:	1529798



  • #2
    It could be fine. We can't tell without seeing results. I know no reason why particular marginal distributions for predictors should be sought or shunned. You wouldn't, I guess, blink twice at a binary predictor with many fewer zeros than ones. That's just the way the data are.

    The key is whether the data are closer or further away from being linear on logit scale with the data as they arrive or with a transformed version.

    To discuss possible transformations, we need to know whether the data contain exact zeros or exact ones.

    I can't imagine why skewness of a predictor would make you consider a different model! The character of the outcome or response (you say "dependent variable") is what drives that.
    Last edited by Nick Cox; 24 Dec 2019, 06:41.

    Comment


    • #3
      Hi Nick,

      Variable X has constructed and made an Index (lies between 0 and 1), I have not transformed. If I transfer the X variable then it will be hard to interpret the result. The X variable contains 2 % of zero observations and 11 % of ones and most of the observations are near to one.

      Results: the X variable has an insignificant effect below the model. However, if I include only the subsample (where X > 0.8) in the regression I find a significant effect on Y (as expected results).

      What would you suggest?

      Code:
      xtlogit y  x x1 x2  x3 
      
      Fitting comparison model:
      
      Iteration 0:   log likelihood = -491.28341  
      Iteration 1:   log likelihood = -463.94626  
      Iteration 2:   log likelihood = -459.80387  
      Iteration 3:   log likelihood = -459.79081  
      Iteration 4:   log likelihood =  -459.7908  
      
      Fitting full model:
      
      tau =  0.0     log likelihood =  -459.7908
      tau =  0.1     log likelihood = -459.74113
      tau =  0.2     log likelihood = -460.73544
      
      Iteration 0:   log likelihood = -459.74113  
      Iteration 1:   log likelihood = -459.55156  
      Iteration 2:   log likelihood = -459.55051  
      Iteration 3:   log likelihood = -459.55051  
      
      Random-effects logistic regression              Number of obs      =      3367
      Group variable: id                              Number of groups   =       119
      
      Random effects u_i ~ Gaussian                   Obs per group: min =         5
                                                                     avg =      28.3
                                                                     max =        48
      
      Integration method: mvaghermite                 Integration points =        12
      
                                                      Wald chi2(4)       =     76.54
      Log likelihood  = -459.55051                    Prob > chi2        =    0.0000
      
      ------------------------------------------------------------------------------
                cc |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
                  x|   .1425727   .4331502     0.33   0.742    -.7063862    .9915315
                    |
                x1|   1.600418   .3819417     4.19   0.000     .8518255     2.34901
                    |
                x2|   2.132501   .3147175     6.78   0.000     1.515666    2.749335
                   |
               x3|  -.4656932   .1116828    -4.17   0.000    -.6845874   -.2467989
             _cons |  -1.845622   .5727893    -3.22   0.001    -2.968269    -.722976
      -------------+----------------------------------------------------------------
          /lnsig2u |  -2.408402   1.579571                     -5.504305    .6875006
      -------------+----------------------------------------------------------------
           sigma_u |   .2999315   .2368816                      .0637904    1.410226
               rho |   .0266164   .0409235                      .0012354    .3767545
      ------------------------------------------------------------------------------
      Likelihood-ratio test of rho=0: chibar2(01) =     0.48 Prob >= chibar2 = 0.244

      Comment


      • #4
        It's hard to know what to advise. With nothing else said x does not have a role at conventional significance levels, so worrying about its skewness may be neither here nor there. But what are x1 x2 x3 and do they have any relation to x? Why use such cryptic variable names any way?

        Exact zeros and exact ones rule out a logit transformation, but a simple power transformation respects the bounds and may improve fit as well as skewness (again, the last is not itself a problem).

        Comment


        • #5
          Thank you for the suggestion.

          Control variables x1 x2 x3 do not have any relation with X.

          Comment

          Working...
          X