Highly Left Skewed variable

hari venkatesh

Join Date: Feb 2019

Posts: 28
#1

Highly Left Skewed variable

24 Dec 2019, 05:50

Hello,

I wanted to run a panel logistic regression with dependent variable Y (0, 1) and an explanatory variable X with other control variables.

Where, X is the index variable scaled between 0 and 1 and it is highly left-skewed (most of the observations near to 1, see the Histogram below).

Is it fine if I include the original X variable in the regression without transformation? or Do I need to use any other regression model (ordered logit or probit)?

Thanks for your consideration.
Tags: categorical, graph, logistic regression, logit, panel data
Nick Cox

Join Date: Mar 2014

Posts: 35585
#2

24 Dec 2019, 06:05

It could be fine. We can't tell without seeing results. I know no reason why particular marginal distributions for predictors should be sought or shunned. You wouldn't, I guess, blink twice at a binary predictor with many fewer zeros than ones. That's just the way the data are.

The key is whether the data are closer or further away from being linear on logit scale with the data as they arrive or with a transformed version.

To discuss possible transformations, we need to know whether the data contain exact zeros or exact ones.

I can't imagine why skewness of a predictor would make you consider a different model! The character of the outcome or response (you say "dependent variable") is what drives that.

Last edited by Nick Cox; 24 Dec 2019, 06:41.
2 likes
Comment

hari venkatesh

Join Date: Feb 2019
Posts: 28

24 Dec 2019, 06:51

Hi Nick,

Variable X has constructed and made an Index (lies between 0 and 1), I have not transformed. If I transfer the X variable then it will be hard to interpret the result. The X variable contains 2 % of zero observations and 11 % of ones and most of the observations are near to one.

Results: the X variable has an insignificant effect below the model. However, if I include only the subsample (where X > 0.8) in the regression I find a significant effect on Y (as expected results).

What would you suggest?

Code:

xtlogit y  x x1 x2  x3 

Fitting comparison model:

Iteration 0:   log likelihood = -491.28341  
Iteration 1:   log likelihood = -463.94626  
Iteration 2:   log likelihood = -459.80387  
Iteration 3:   log likelihood = -459.79081  
Iteration 4:   log likelihood =  -459.7908  

Fitting full model:

tau =  0.0     log likelihood =  -459.7908
tau =  0.1     log likelihood = -459.74113
tau =  0.2     log likelihood = -460.73544

Iteration 0:   log likelihood = -459.74113  
Iteration 1:   log likelihood = -459.55156  
Iteration 2:   log likelihood = -459.55051  
Iteration 3:   log likelihood = -459.55051  

Random-effects logistic regression              Number of obs      =      3367
Group variable: id                              Number of groups   =       119

Random effects u_i ~ Gaussian                   Obs per group: min =         5
                                                               avg =      28.3
                                                               max =        48

Integration method: mvaghermite                 Integration points =        12

                                                Wald chi2(4)       =     76.54
Log likelihood  = -459.55051                    Prob > chi2        =    0.0000

------------------------------------------------------------------------------
          cc |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
            x|   .1425727   .4331502     0.33   0.742    -.7063862    .9915315
              |
          x1|   1.600418   .3819417     4.19   0.000     .8518255     2.34901
              |
          x2|   2.132501   .3147175     6.78   0.000     1.515666    2.749335
             |
         x3|  -.4656932   .1116828    -4.17   0.000    -.6845874   -.2467989
       _cons |  -1.845622   .5727893    -3.22   0.001    -2.968269    -.722976
-------------+----------------------------------------------------------------
    /lnsig2u |  -2.408402   1.579571                     -5.504305    .6875006
-------------+----------------------------------------------------------------
     sigma_u |   .2999315   .2368816                      .0637904    1.410226
         rho |   .0266164   .0409235                      .0012354    .3767545
------------------------------------------------------------------------------
Likelihood-ratio test of rho=0: chibar2(01) =     0.48 Prob >= chibar2 = 0.244

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35585
#4

24 Dec 2019, 08:04

It's hard to know what to advise. With nothing else said x does not have a role at conventional significance levels, so worrying about its skewness may be neither here nor there. But what are x1 x2 x3 and do they have any relation to x? Why use such cryptic variable names any way?

Exact zeros and exact ones rule out a logit transformation, but a simple power transformation respects the bounds and may improve fit as well as skewness (again, the last is not itself a problem).
Comment
hari venkatesh

Join Date: Feb 2019

Posts: 28
#5

24 Dec 2019, 08:25

Thank you for the suggestion.

Control variables x1 x2 x3 do not have any relation with X.
Comment

Announcement