Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Advice on log-transformed variable

    Hello!

    I am attempting to analyse some panel data using fixed time effects.

    I understand that if a variable has negative values, then I need to add a value to it so that the minimum value is over 0, ideally 1.

    If the minimum value for my variable polgov is -0.795, would this code be correct?

    Code:
    sum polgov
    
    gen ln_polgov = ln(polgov + 2)
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float polgov
       .10911667
       .10536623
       .07698596
      .001213237
     -.010889966
    -.0015461458
    -.0009272145
      -.01928202
     -.011525432
     -.002667901
    -.0031931796
     -.011856452
     -.008800886
    -.0015845464
     .0022473426
       .03493692
       .02719913
      .029458277
        .0385677
      .033481546
      .007932514
      .003868169
        .7951179
        .8537826
         .944502
        .5450747
         .326949
        .5163905
        .5661585
         .536807
        .6214488
        .6437451
        .6830839
        .6002622
        .5370712
        .4841655
        .4619964
        .4728099
        .4134577
        .4179388
        .5501199
        .4571055
       1.4858947
        .9762693
        .6410073
          .68331
       .54647344
        .3377216
           .3951
        .2024828
       .19571774
        .1267215
        .2077846
       .20950356
       .21788646
        .1908137
       .29480013
       .18010856
        .2088807
       .15212043
       .14043501
       .20639417
        .1195842
       .12359986
       .13942388
       .13340275
       .13742124
        .3107648
        .2214064
       .25246823
        .2181802
       .23354974
        .2127226
       .22437154
       .24474743
          .31228
       .29674697
        .2750496
        .2989097
        .2783753
       .22762804
        .2396685
        .3495828
        .3508159
       .02182449
      -.02079328
      .020043815
      -.10097527
      -.06281429
       -.1105917
     -.005623568
     .0003969337
       .02193683
     -.015747579
     -.032890383
     -.015334428
       .02742178
      .015209296
       .00904563
    -.0008843911
    end
    Thank you in advance!
    Last edited by Cassie Wright; 04 Jul 2023, 09:32.

  • #2
    Cassie:
    no, it is not.
    More generally, this way you're making up your data, not to say the issues you come across when back-transforming, if requested.
    I'd keep your variables in their original metric.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Hi Carlo, thank you very much for the heads up. I was wondering if you could answer a couple of questions I have, so that I understand where I have gone wrong.

      Firstly, I'm looking at some notes on log-transformations and this lecture note says this:

      "If the variable has negative values, you need to add a value high enough so the minimum value is over zero (preferable 1). For example, if the lowest value in ‘varX’ is -1, then type:

      gen ln_varX = ln(varX + 2) The natural log of 1 is zero."

      Source: https://www.princeton.edu/~otorres/Panel101.pdf p.27

      Would adding the integer 2 in this example be incorrect too, or is it how I have specifically coded it that is wrong?

      Secondly, just to clarify, would you advise that for the purposes of my analysis I create a log-transformed variable despite it having a negative minimum value so that I am able to back transform it later?




      Comment


      • #4
        I doubt that any researcher would defend this procedure as correct, because it is so arbitrary. Why not add 0.8 or 1.8 or 2.2?

        The prior question is why transform at all?

        I have more sympathy for the transformation sign(y) * log(1 + |y|) as preserving sign (negative, zero, and positive map to the same) and as similar to y when y is at or near zero and similar to log(y) for y >> 0 and to -log(-y) for y << 0, but there is still a question of a good rationale.

        In practice your data are skew relative to zero and regardless of zero, but is zero a special point? And even that transformation, which can be quite strong if any value is a long way from zero, doesn't do much. I applied

        Code:
        transplot qplot polgov , trans(@ sign(@)*log1p(abs(@))) yli(0, lp(solid) lw(vthin))
        where transplot is from SSC and qplot from the Stata Journal, but it doesn't much difference to your data example. On transplot see also https://www.stata.com/meeting/uk19/slides/uk19_cox.pptx
        Click image for larger version

Name:	polgov.png
Views:	1
Size:	41.1 KB
ID:	1719345

        Comment


        • #5
          Originally posted by Nick Cox View Post
          I doubt that any researcher would defend this procedure as correct, because it is so arbitrary. Why not add 0.8 or 1.8 or 2.2?

          The prior question is why transform at all?

          I have more sympathy for the transformation sign(y) * log(1 + |y|) as preserving sign (negative, zero, and positive map to the same) and as similar to y when y is at or near zero and similar to log(y) for y >> 0 and to -log(-y) for y << 0, but there is still a question of a good rationale.

          In practice your data are skew relative to zero and regardless of zero, but is zero a special point? And even that transformation, which can be quite strong if any value is a long way from zero, doesn't do much. I applied

          Code:
          transplot qplot polgov , trans(@ sign(@)*log1p(abs(@))) yli(0, lp(solid) lw(vthin))
          where transplot is from SSC and qplot from the Stata Journal, but it doesn't much difference to your data example. On transplot see also https://www.stata.com/meeting/uk19/slides/uk19_cox.pptx
          [ATTACH=CONFIG]n1719345[/ATTACH]
          Hi Nick

          Thank you for your explanation. I'm new to handling panel data and I assumed that I could not do a log transformation without making sure the minimum value was over 0 (according to the lecture notes it should be 1). It would make sense why my specific variable would be negative as it is a governance indicator score. My other variables such as GDP and trade openness are all positive. I thought it would make sense to make sure that the constant was above 0 to see the countries that score negatively on the indicator. Would you suggest that I just leave the negative values for the governance indicator variables?

          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input str52 country int year float polgov double(gdp trade)
          "Albania"             1996    .10911667 1009.9771137556281    44.895444716607
          "Albania"             1998    .10536623   813.789396580449   48.1375348189415
          "Albania"             2000    .07698596 1126.6833401071663 63.454073132627954
          "Albania"             2002   .001213237 1425.1242186014215  68.52506822343872
          "Albania"             2003  -.010889966  1846.120120812074  67.02056104276284
          "Albania"             2004 -.0015461458 2373.5812917005464  67.04718730682052
          "Albania"             2005 -.0009272145 2673.7878029603644  70.87234677546716
          "Albania"             2006   -.01928202  2972.743618181689  74.26709023930518
          "Albania"             2007  -.011525432 3595.0383020799686  83.20208010539311
          "Albania"             2008  -.002667901  4370.539711157367  77.45175149628565
          "Albania"             2009 -.0031931796   4114.13404061058  75.09491440080564
          "Albania"             2010  -.011856452 4094.3496988357133  76.54339024478783
          "Albania"             2011  -.008800886  4437.141146311478  81.21856893306597
          "Albania"             2012 -.0015845464 4247.6313562470905  76.51020062109332
          "Albania"             2013  .0022473426  4413.063397389207  75.87371384678971
          "Albania"             2014    .03493692  4578.633208121549  75.40784559648249
          "Albania"             2015    .02719913  3952.803584108458  71.80100633964696
          "Albania"             2016   .029458277   4124.05538986272  74.80986282317099
          "Albania"             2017     .0385677  4531.032206758928   78.1942495319136
          "Albania"             2018   .033481546  5287.660816997309   76.8081841283291
          "Albania"             2019   .007932514  5396.214226909946  76.27919464957633
          "Albania"             2020   .003868169  5343.037703995597  59.82972943216276
          "Algeria"             2000     .7951179 1780.3760706915123   62.8583436360347
          "Algeria"             2002     .8537826  1794.811114231999 61.134171447472596
          "Algeria"             2003      .944502 2117.0482289928673  62.12477302891426
          "Algeria"             2004     .5450747   2624.79523151946   65.7014218464212
          "Algeria"             2005      .326949  3131.328175652422  71.27860095974813
          "Algeria"             2006     .5163905  3500.134609887463  70.73001243525364
          "Algeria"             2007     .5661585  3971.803488282229   71.9381290438053
          "Algeria"             2008      .536807  4946.564017207589  76.68451816528223
          "Algeria"             2009     .6214488   3898.47880576084  71.32433054692379
          "Algeria"             2010     .6437451  4495.921455483428   69.8666612628678
          "Algeria"             2011     .6830839  5473.281801103243  67.47430173234702
          "Algeria"             2012     .6002622  5610.733306103322  65.40497919812694
          "Algeria"             2013     .5370712   5519.77757552373 63.610823671114694
          "Algeria"             2014     .4841655  5516.229463215619 62.414316011088076
          "Algeria"             2015     .4619964  4197.419971018676  59.69512859871836
          "Algeria"             2016     .4728099  3967.200659521072  55.92566787717814
          "Algeria"             2017     .4134577  4134.936098981946  55.32140302145944
          "Algeria"             2018     .4179388  4171.795309035324  58.06549177081901
          "Algeria"             2019     .5501199 4021.9836079660345  51.80958365004973
          "Algeria"             2020     .4571055 3354.1573026511446  45.33051087922498
          "Angola"              2000    1.4858947  556.8842437345603 152.54710945485846
          "Angola"              2002     .9762693  872.6576380458995 105.30159231121831
          "Angola"              2003     .6410073  982.8055896447095 103.90122955137177
          "Angola"              2004       .68331 1254.6961261224296 103.57994705146601
          "Angola"              2005    .54647344 1900.7238165071046 106.59096212952525
          "Angola"              2006     .3377216 2597.9635848427147  94.62515933204774
          "Angola"              2007        .3951  3121.348735206813 108.06006789397833
          "Angola"              2008     .2024828  4081.717497195369   121.364708453698
          "Angola"              2009    .19571774 3123.6988982993735 122.44612620960417
          "Angola"              2010     .1267215 3496.7847960803215 104.12364829307379
          "Angola"              2011     .2077846  4511.153227190339  99.98250633133675
          "Angola"              2012    .20950356  4962.552071900818  91.80009734191142
          "Angola"              2013    .21788646  5101.983876411284  86.81193275879264
          "Angola"              2014     .1908137  5059.080441288163  79.33292278288972
          "Angola"              2015    .29480013  3100.830685305332  62.88851608901922
          "Angola"              2016    .18010856 1709.5155340455294  53.37015806759876
          "Angola"              2017     .2088807  2283.214232557247   52.2568271631211
          "Angola"              2018    .15212043  2487.500995552675  66.37801332633987
          "Angola"              2019    .14043501 2142.2387571285367  57.82953811830357
          "Angola"              2020    .20639417  1502.950754145175  55.37581626579367
          "Antigua and Barbuda" 1996     .1195842  9079.481211920536 202.26524923001398
          "Antigua and Barbuda" 1998    .12359986  10029.47774981066 195.14354626962884
          "Antigua and Barbuda" 2000    .13942388 11010.197460134172  173.3685908927931
          "Antigua and Barbuda" 2002    .13340275 10549.666189280153 163.90534966322997
          "Antigua and Barbuda" 2003    .13742124 10968.892683910295 171.67545312614877
          "Antigua and Barbuda" 2004     .3107648 11650.848477085794 181.15629794585365
          "Antigua and Barbuda" 2005     .2214064  12808.01015366366 175.74945691527878
          "Antigua and Barbuda" 2006    .25246823 14310.686234785375 170.26640517773674
          "Antigua and Barbuda" 2007     .2181802 16006.136110749845 165.31478790785593
          "Antigua and Barbuda" 2008    .23354974 16457.104063258943  165.8092717594933
          "Antigua and Barbuda" 2009     .2127226  14530.59868963529  151.0874448588719
          "Antigua and Barbuda" 2010    .22437154 13404.516016103624 152.84589020116138
          "Antigua and Barbuda" 2011    .24474743  13117.14694089678 152.37561938000144
          "Antigua and Barbuda" 2012       .31228 13686.476585397588 151.52815245103184
          "Antigua and Barbuda" 2013    .29674697 13350.149136672973  156.2113037671909
          "Antigua and Barbuda" 2014     .2750496 14004.811212216295 156.15479450430908
          "Antigua and Barbuda" 2015     .2989097  14861.88270747037 139.78670405395295
          "Antigua and Barbuda" 2016     .2783753  15862.65166274883 136.87293427314776
          "Antigua and Barbuda" 2017    .22762804 16110.312399780018  133.8728945270318
          "Antigua and Barbuda" 2018     .2396685 17514.355863732675 135.16296781817175
          "Antigua and Barbuda" 2019     .3495828  18187.77971171123 140.41946993427788
          "Antigua and Barbuda" 2020     .3508159 15284.772383537815  88.54829059605558
          "Argentina"           1996    .02182449  7690.157002547828  21.50646840572148
          "Argentina"           1998   -.02079328  8250.673174143214  23.35002797306892
          "Argentina"           2000   .020043815 7666.5178342378285 22.622444777734284
          "Argentina"           2002   -.10097527 2579.4887693328405  41.75272435856421
          "Argentina"           2003   -.06281429 3333.1529038899735  40.64474803110855
          "Argentina"           2004    -.1105917  4258.160260608751  40.69264610894695
          "Argentina"           2005  -.005623568  5086.627760731341 40.551270970623804
          "Argentina"           2006  .0003969337  5890.978001697949  40.43347987191512
          "Argentina"           2007    .02193683  7210.595547558988  40.94517061857098
          "Argentina"           2008  -.015747579   8977.50685093365 40.402673379038234
          "Argentina"           2009  -.032890383  8184.389889239906  34.05712690548787
          "Argentina"           2010  -.015334428 10385.964431955526  34.97101326356957
          "Argentina"           2011    .02742178 12848.740476259789  35.20615499996436
          "Argentina"           2012   .015209296 13082.664325571988 30.526542371710804
          "Argentina"           2013    .00904563 13080.254732336658  29.33392900210371
          "Argentina"           2014 -.0008843911  12334.79824538929 28.406793645227452
          end

          Comment


          • #6
            Dear Cassie Wright,

            Is the variable you want to transform an explanatory variable?

            Best wishes,

            Joao

            Comment


            • #7
              You have yet to say why you think polgov should be transformed at all.

              Comment

              Working...
              X