Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Transforming Independent Variables

    Hello,

    good morning. I am looking for some guidance on transforming (or not) of a set of independent variables that I will be using for a logit model. Most of these variables, particularly proportions and density - have true meaningful zeros. The temptation is for a log transformation but that will not deal with the zero issue.

    Thanks for any assistance.

    cheers, Cy

    Code:
                          Age at interview
    -------------------------------------------------------------
          Percentiles      Smallest
     1%           16             15
     5%           19             15
    10%           21             16       Obs                 595
    25%           25             16       Sum of Wgt.         595
    
    50%           30                      Mean           30.61513
                            Largest       Std. Dev.       8.64633
    75%           35             65
    90%           41             65       Variance       74.75903
    95%           46             72       Skewness       1.217693
    99%           61             74       Kurtosis       5.868338
    
                  Highest grade in school completed
    -------------------------------------------------------------
          Percentiles      Smallest
     1%            7              2
     5%            8              5
    10%            9              5       Obs                 588
    25%           11              7       Sum of Wgt.         588
    
    50%           12                      Mean           11.87075
                            Largest       Std. Dev.      2.198876
    75%           13             18
    90%           14             18       Variance       4.835054
    95%           16             20       Skewness       .1317919
    99%           18             22       Kurtosis       4.874681
    
               Score out of 11 on KAB qnnaire section
    -------------------------------------------------------------
          Percentiles      Smallest
     1%            8              7
     5%            9              7
    10%            9              7       Obs                 422
    25%           10              7       Sum of Wgt.         422
    
    50%           10                      Mean           10.13507
                            Largest       Std. Dev.      .8599024
    75%           11             11
    90%           11             11       Variance       .7394322
    95%           11             11       Skewness      -.8235116
    99%           11             11       Kurtosis       3.482594
    
                           SH:Density_Know
    -------------------------------------------------------------
          Percentiles      Smallest
     1%            0              0
     5%            0              0
    10%         .014              0       Obs                 562
    25%         .092              0       Sum of Wgt.         562
    
    50%           .2                      Mean           .2290338
                            Largest       Std. Dev.      .1633596
    75%         .356             .5
    90%           .5             .5       Variance       .0266864
    95%           .5             .5       Skewness        .296762
    99%           .5             .5       Kurtosis       1.866217
    
                           SH:Density_Drug
    -------------------------------------------------------------
          Percentiles      Smallest
     1%            0              0
     5%            0              0
    10%            0              0       Obs                 562
    25%            0              0       Sum of Wgt.         562
    
    50%         .027                      Mean           .0689609
                            Largest       Std. Dev.      .0986219
    75%         .107             .5
    90%         .186             .5       Variance       .0097263
    95%         .286             .5       Skewness       2.062268
    99%           .5             .5       Kurtosis       7.724059
    
                           SH:Density_Sex
    -------------------------------------------------------------
          Percentiles      Smallest
     1%            0              0
     5%            0              0
    10%            0              0       Obs                 562
    25%            0              0       Sum of Wgt.         562
    
    50%         .005                      Mean           .0140801
                            Largest       Std. Dev.      .0253859
    75%         .018           .107
    90%         .036           .167       Variance       .0006444
    95%         .053           .167       Skewness       5.274151
    99%         .107           .333       Kurtosis       52.14508
    
                       
    mean size ego network
    -------------------------------------------------------------
          Percentiles      Smallest
     1%            1              1
     5%            2              1
    10%            3              1       Obs                 589
    25%            6              1       Sum of Wgt.         589
    
    50%            9                      Mean           11.11205
                            Largest       Std. Dev.      8.034429
    75%           15             40
    90%           21             44       Variance       64.55205
    95%           27             54       Skewness       1.737934
    99%           38             66       Kurtosis        8.57792
    
                   proportion of gender homophily
    -------------------------------------------------------------
          Percentiles      Smallest
     1%            0              0
     5%            0              0
    10%            0              0       Obs                 589
    25%           .2              0       Sum of Wgt.         589
    
    50%     .3846154                      Mean           .3687123
                            Largest       Std. Dev.      .2372442
    75%     .5384615              1
    90%     .6666667              1       Variance       .0562848
    95%     .7307692              1       Skewness      -.0105412
    99%         .875              1       Kurtosis       2.254534
    
                  proportion of behavcat homophily
    -------------------------------------------------------------
          Percentiles      Smallest
     1%            0              0
     5%            0              0
    10%            0              0       Obs                 589
    25%            0              0       Sum of Wgt.         589
    
    50%            0                      Mean           .0959932
                            Largest       Std. Dev.      .1599062
    75%          .15             .8
    90%     .3333333              1       Variance         .02557
    95%           .4              1       Skewness       2.299208
    99%          .75              1       Kurtosis       9.709863
    
                    proportion of race homophily
    -------------------------------------------------------------
          Percentiles      Smallest
     1%            0              0
     5%            0              0
    10%     .1111111              0       Obs                 589
    25%     .4545455              0       Sum of Wgt.         589
    
    50%          .75                      Mean           .6676776
                            Largest       Std. Dev.      .3243566
    75%            1              1
    90%            1              1       Variance       .1052072
    95%            1              1       Skewness       -.756955
    99%            1              1       Kurtosis       2.355689
    
                    mean absolute age difference
    -------------------------------------------------------------
          Percentiles      Smallest
     1%          1.5              0
     5%            3              0
    10%     3.851852              0       Obs                 589
    25%         5.75              0       Sum of Wgt.         589
    
    50%        8.875                      Mean           9.974443
                            Largest       Std. Dev.      6.615875
    75%     12.55556       40.63636
    90%           17             43       Variance        43.7698
    95%           20             54       Skewness       3.008487
    99%     36.33333             72       Kurtosis       21.36231
    
                   Number of sexual contacts (est)
    -------------------------------------------------------------
          Percentiles      Smallest
     1%            0              0
     5%            1              0
    10%            1              0       Obs                 549
    25%            1              0       Sum of Wgt.         549
    
    50%            2                      Mean           60.49909
                            Largest       Std. Dev.      337.1342
    75%            6           2001
    90%           51           2700       Variance       113659.5
    95%          200           3001       Skewness       10.48025
    99%         1802           5401       Kurtosis       138.0463

  • #2
    Square roots, cube roots and neglog are all available for right-skewed distributions with possible outliers. The aim of transformation, however, is better if it is to handle nonlinearity and/or outliers rather than to attain symmetric marginal distributions. (There is a myth in many corners that predictors should be normally distributed.)

    Comment


    • #3
      You need to think hard about your variables before making a bunch of arbitrary transformations. Transforming iv's changes the model being estimated and can make it harder to interpret.

      Glancing at your descriptive statistics, it is not obvious why you'd want to transform most of your variables. In others, some serious consideration of the relations might be worthwhile. For example, with some like "proportion of behavcat homophily" you might wonder whether there is a different relation for all the zeros and the non-zero values. On the other hand, sexual contacts does look like a few observations with extremely high values have a disproportionate influence on an estimation. How to handle such observations (often termed outliers) is a source of substantial disagreement among folks who do applied statistical analysis.

      Comment

      Working...
      X