Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • log transformation generating missing data with an additional row

    I have time series data. Simply log transformation (ln(XXX)) is working fine but creating an additional row with missing values for all the variables. Other than that, this looks fine.
    Why is Stata creating an additional row with missing values?



  • #2
    On the face of it, there is no indeed no reason why log transformation would create an extra row (meaning, observation) with missing values, so I think you need to show a data example with complete code in which you can show that this is happening.

    Comment


    • #3
      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input int year double(fer rem_curr) long pop double(cpi rem mr edu gdppc urb Populationages65andabove PopulationmilYearbookande hepc) float(lnfr lnrem lnmr lnedu lngdppc lnhepc)
      1974 6.794      16509921.42  75839348               11                .02 150.7               .9  399.5105081305051  9.034 3.05211222973668   76.4  3.309293193717277   1.91604  -3.912023  5.015291 -.10536052  5.99024 1.1967347
      1975 6.737      71286161.01  77854351               12                .08 149.5               .8 373.26037369699947  9.836  3.1096958415959     78 3.9493589743589745 1.9076147  -2.525729  5.007297 -.22314355 5.922276 1.3735533
      1976 6.674      18761274.34  79855318               13                .02   148               .5  384.5095520287145 10.701 3.17544912913627   79.9 6.9687108886107625 1.8982193  -3.912023  4.997212  -.6931472 5.951969 1.9414302
      1977 6.593      78875038.15  81867264               14                .07   146               .6 385.08552032014006  11.63   3.240405733117   81.8  7.568337408312957 1.8860085   -2.65926  4.983607 -.51082563 5.953465 2.0239735
      1978 6.516      115435371.4  83889655               15                .09 143.6         .6448609  402.3855959317121 12.629 3.30230348426156   83.7  9.518637992831541 1.8742607 -2.4079456  4.967031  -.4387206 5.997411 2.2532518
      1979 6.424      171138702.4  85935072               16                .13 140.9               .7 411.66926652023704 13.701 3.35870669520534   85.6 11.099532710280375  1.860041 -2.0402207   4.94805  -.3566749  6.02022  2.406903
      1980 6.323        338666748  88016432               17                .23 137.9               .7  405.2267702595173 14.851 3.40778641997091   87.7 13.697491448118585 1.8441938  -1.469676  4.926529  -.3566749 6.004447  2.617213
      1981 6.239      381052154.5  90303105               18                .23 134.7               .7  423.5371228674734 15.801 3.44259869825406   89.9 14.327474972191322   1.83082  -1.469676   4.90305  -.3566749 6.048641  2.662179
      1982 6.116      526463684.1  92814507               19                 .3 131.4               .7  420.8720045329513 16.212 3.46310519897016   91.4  17.10940919037199 1.8109083 -1.2039728  4.878246  -.3566749 6.042329 2.8396285
      1983 5.891      642408508.3  95335155               20                .33 127.9               .8 425.64657116856836 16.631 3.47853791960866   93.3 17.081136120042874 1.7734258 -1.1086626  4.851249 -.22314355 6.053609  2.837975
      1984 5.727        500747406  97814966               22                .23 124.2               .9 434.78235160430745  17.06 3.49135734505086   95.3 22.551416579223506 1.7451918  -1.469676  4.821893 -.10536052 6.074846  3.115798
      1985 5.541      502471374.5 100253825               23                .21 120.4               .9 438.38247882843933 17.496 3.50087091440152   97.4 23.101848049281312  1.712175 -1.5606477   4.79082 -.10536052 6.083092 3.1399126
      1986   5.3      576282470.7 102635572 24.2800028287658 .23125374225671932 116.6              1.1  446.0802530641697 17.941 3.50793728708405   99.5  27.26874371859296  1.667707 -1.4642397 4.7587495  .09531018 6.100499  3.305741
      1987 5.073      747808898.9 104959291 26.6775793190573 .26706890187931026 112.8              1.1  452.6597569237309 18.395 3.51269284012218  101.7 27.908456243854474 1.6239324 -1.3202486 4.7256165  .09531018 6.115141   3.32893
      1988 4.844      763622131.3 107216008 28.6551258370912 .24855159455867146 108.9              1.1 453.83923688347886 18.859 3.51394543394092  103.9 31.968527430221364 1.5777408  -1.392105   4.69043  .09531018 6.117743  3.464752
      1989 4.676      757979553.2 109433799 30.3874656686181 .22793527080074988   105              1.2 457.25433477770497  19.33 3.51008192633429  106.2 34.811676082862526  1.542443 -1.4786936   4.65396  .18232156  6.12524  3.549953
      1990 4.479      778865600.6 111633717 32.2492201402459 .21634552721763337 101.1              1.2  473.4448210210855 19.811 3.50089256635609  108.6  38.48987108655617 1.4993998 -1.5308785 4.6161103  .18232156 6.160035  3.650395
      1991  4.24      769365722.7 113787109  34.299420491319 .19713012058705054  97.1              1.1 480.67335897608666 20.257 3.48431341198764  111.5  41.05829596412556 1.4445633 -1.6238912 4.5757413  .09531018 6.175188  3.714993
      1992 4.026      911759948.7 115952192 35.5458878143966 .22121378254508725  93.1              1.2  497.3711737353226  20.61  3.4642372412599  113.3  48.66725507502207 1.3927733 -1.5086257 4.5336742  .18232156 6.209337 3.8850064
      1993 3.852       1007375061 118132360 36.6175318517239 .23288056541747312  89.2              1.3  511.1935094212935 20.966 3.44699748888875  115.5  59.85281385281385 1.3485925 -1.4572295  4.490881  .26236427 6.236748 4.0918884
      1994 3.774       1150881226 120343179 38.5632923178032 .24799042990478706  85.2        1.3369002   521.323144726555 21.328 3.43124598694538  117.5 61.719148936170214 1.3281355  -1.394365 4.4450016  .29035366  6.25637 4.1225944
      1995 3.605 1201664266.51586 122600370 42.5344675825385 .23043601968692384  81.3        1.3369002  537.9319466439827 21.693 3.41881553864805  119.3  87.80385582564962 1.2823218  -1.467782  4.398146  .29035366 6.287732 4.4751053
      1996 3.504 1344661208.65883 124862179     43.545566759 .24730791687310255  77.4        1.3050351  552.0771071387666 22.064 3.41280605074175  121.2  90.03300330033004  1.253905  -1.397121 4.3489866  .26622993 6.313688 4.5001764
      1997 3.491 1526499543.77216 127198069 45.8559208091063 .26171025572120077  73.6              1.3  566.2711408476324 22.438 3.41304079304443    123 107.59349593495935 1.2501882 -1.3405173  4.298645  .26236427 6.339073   4.67836
      1998 3.393 1605786652.44555 129622332  49.708844392463  .2492150962339381  69.9        1.4927874  584.4481751740466 22.818  3.4200599917461  124.8 119.09455128205128 1.2217145  -1.389439 4.2470655   .4006451 6.370668  4.779918
      1999 3.304 1806794191.65168 132108065 52.7444123541293  .2593002265737669  66.4             1.32   600.232323162952 23.202 3.43512488596031  126.6 150.49763033175356 1.1951338 -1.3497688  4.195697  .27763173 6.397317  5.013947
      2000 3.218 1967528858.46792 134544304 53.9091441150379 .27126474722340604    63             1.28  620.5604728221052  23.59 3.46140181652951  128.4 172.74143302180684 1.1687601   -1.30466 4.1431346  .24686007 6.430623  5.151796
      2001 3.146 2104551469.14176 136805810 54.9911943003265  .2797446939141972  59.9              1.3  641.2889228762742 24.096 3.49633577880817    130 175.14615384615385 1.1461318  -1.273878 4.0926766  .26236427  6.46348  5.165621
      2002 3.084 2858057770.42652 138933658 56.8238115576626 .36202050605706204    57             1.25  655.6721393622719 24.756 3.53929895087049    132 181.54545454545453 1.1262275 -1.0160544 4.0430512  .22314355 6.485661  5.201506
      2003 3.002 3191664732.05594 140970351 60.0449873584211  .3770619644843826  54.3             1.68   676.826252046427 25.429 3.59350455333156  133.9  185.4368932038835 1.0992787  -.9753457  3.994524   .5187938 6.517415  5.222714
      2004 2.914 3583817227.52373 142902856 64.6009226216381  .3882095578972448  51.7             1.48  702.6563570818726 26.114 3.65582722732927  135.9  212.4944812362031 1.0695267    -.94621  3.945458   .3920421 6.554868  5.358916
      2005 2.808 4314502845.76468 144715786 69.1531029680564  .4311249727458351  49.3            1.545  739.2037183534092 26.809 3.72492189621939  137.8 227.05370101596515 1.0324725  -.8413573  3.897924   .4350239 6.605574  5.425187
      2006 2.712  5427515428.7338 146405973 73.8314909913831  .5021120368458488    47             1.61   779.419570444998 27.517 3.80177383950799  139.8   247.381974248927  .9976864   -.688932 3.8501475   .4762342 6.658549  5.510933
      2007 2.626 6562316321.90849 147989278 80.5553137783967  .5504688089750395  44.8             1.79  825.5082299645373 28.237 3.88738601724917  141.8 278.78702397743297  .9654618   -.596985  3.802208   .5822156 6.715999  5.630448
      2008 2.535 8940611606.37691 149474264 87.7263034206632  .6818219455923263  42.7             1.71  866.4581576213504 28.968 3.98311546121115  143.8  290.3407510431154  .9301937  -.3829867  3.754199  .53649336 6.764414  5.671055
      2009  2.44 10520653005.6389 150873714  92.484115240665  .7539837210941032  40.7             1.59   901.729626251937 29.709 4.09030396109955  146.7  387.9822767552829  .8919981  -.2823845  3.706228    .463734 6.804315  5.960959
      2010  2.34 10850211617.3056 152201981              100  .7128824175623312  38.9 1.77802832192127  943.6642286423604 30.462 4.20925466072201  140.6 477.52489331436703  .8501509  -.3384388  3.660994   .5755051 6.849771  6.168616
      2011 2.262 12071073184.1576 153591076  111.39516515524  .7055268697065441  37.2 1.96605664384263  995.5799471556621 31.225 4.34377354171206  148.7  520.9280430396773  .8162494  -.3488104  3.616309   .6760299 6.903326  6.255612
      2012 2.209 14119627033.9853 155070101 118.321164250415   .769542696880096  35.6 1.87681685753293 1050.3914040406614 31.993 4.48905587386876  152.7  545.7760314341847    .79254 -.26195884  3.572346   .6295772 6.956918  6.302209
      2013 2.184 13866954019.3967 156532141 127.231228786487  .6962798522631387  34.1 1.86282889139239 1103.1569590712293 32.762 4.62684307316426  154.7  830.1098901098902   .781158  -.3620036  3.529297   .6220962 7.005931  6.721558
      2014 2.152 14987531478.7427 157964929 136.126776861358  .6969889132763543  32.7 1.84884092525187  1159.407534759807 33.535 4.75656753138087  156.8  811.2627551020407  .7663977  -.3609858  3.487375   .6145589 7.055665  6.698592
      2015 2.114 15295536094.8351 159383179 144.558850888013  .6638615736026592  31.4 1.83485295911133 1224.3864766816523 34.308 4.90260142194805  158.9   709.100062932662  .7485819  -.4096816  3.446808   .6069643 7.110195  6.563997
      2016 2.097 13574285886.2473 160811932 152.529140322278  .5534085944447571  30.1  1.8208649929708 1299.8309136079226 35.083 5.07223834276104  160.8  698.3333333333333  .7405078 -.59165865  3.404525   .5993116  7.16999  6.548697
      2017 2.043 13501933032.7097 162186147 161.226458913586  .5163520167305026  28.9 1.66246855487557  1373.753637401862 35.858 5.25888905067487 162.75  822.1382488479263  .7144193  -.6609665 3.3638415   .5083036 7.225302  6.711908
      2018  2.04 15566241930.3078 163523103 170.164243384102  .5594178879621191  27.8 1.50407211678041 1462.2505025571325 36.632 5.45814402305543 163.65 1033.8466238924534  .7129498  -.5808585  3.325036   .4081762 7.287732  6.941042
      2019 2.031 18363859531.2812 164913055 179.679821747048  .6197401272841386  26.7 1.34567567868521 1564.2080244563588 37.405 5.66255837053046  166.5 1137.3873873873874  .7085283   -.478455  3.284664  .29689625 7.355135  7.036489
      2020 2.003 21751646479.1935 166298024 189.905534708779   .688759219388572  25.7 1.34567567868521 1604.6659234742021 38.177   5.835135780086 168.22  995.6604446558079  .6946461  -.3728635  3.246491  .29689625 7.380671  6.903406
      2021 1.981   22205503377.12 167658854 200.437039174984  .6607788755558205  24.9 1.34567567868521  1702.080284141892 38.946 5.97381333056883  169.1 1573.6717918391487  .6836018   -.414336  3.214868  .29689625 7.439607  7.361167
      2022  1.95 21505473781.2512 169384897 215.864586624407  .5881565391172973  24.1             1.33 1804.3493461776036 39.711 6.13515765182076  171.3 1782.9830706363105  .6678294 -.53076214  3.182212  .28517893 7.497955  7.486043
      2023  1.91 22168181204.6103 171466990 237.199569600947  .5450491189277106 23.62             1.32 1885.3773361354513 40.473 6.32054805417649  170.8 2157.0784543325526  .6471033  -.6068794  3.162094  .27763173 7.541883   7.67651
         .     .                .         .                .                  .     .                .                  .      .                .      .                  .         .          .         .          .        .         .
      end
      format %ty year

      Comment


      • #4
        These are my simple codes for the log transformation:
        gen lnfr = log(fer)
        gen lnrem = log(rem)
        gen lnmr = log(mr)
        gen lnedu = log(edu)
        gen lngdppc = log(gdppc)
        gen lnhepc = log(hepc)

        Comment


        • #5
          The extra observation is not added as a side-effect of log transformation. It was in your data example at the outset. See the last observation in #3, which is all missing values. Log transformation can only yield missings from missings.

          Sometimes such observations appear as a result of importing too much from an MS Excel spreadsheet file. although you might be able to work out that something else happened. Either way, you might as well drop all observations with entirely missing values.

          Comment


          • #6
            Much appreciated, Nick!

            Comment


            • #7
              Log transformation can only yield missings from missings.
              Or from any number zero or below.

              Comment


              • #8
                Zero or negative input is not the issue here, but indeed a more careful rephrase would be.

                Log transformation can only yield missings if given missings.

                Indeed, Mata can also yield complex results for the logarithms of negative values.

                Comment


                • #9
                  Hi Nick,
                  I was able to identify the problem. When I import the data from Excel, it creates an additional row with all missing values. Can you tell me why it's so?

                  If I directly copy from Excel and paste it into Stata, then there are no missing values!

                  Comment


                  • #10
                    How can I delete a specific row? In my case, the last row has all missing values and needs to be deleted.
                    Thanks.

                    Comment


                    • #11
                      As advised in #5 you need to use the drop command. For your data example


                      Code:
                      drop in L
                      would work, but i would recommend the more explicit

                      Code:
                      drop if missing(year)
                      See also

                      Code:
                      help drop
                      I recommend talking about observation not row as the terminology most often by Stata for data.
                      Last edited by Nick Cox; 21 Apr 2025, 04:44.

                      Comment


                      • #12
                        This works perfectly.

                        However, can you tell me why Stata was creating a new/additional row with all missing values?

                        Comment


                        • #13
                          It is unlikely that this is Stata's doing. Sometimes Excel files have whitespaces or other invisible characters, and that can make it look like it has only some cells populated when in fact there are more.

                          One way to check that this is the case, is to try exporting the Excel sheet as a CSV file. Then open the CSV file in a plain text editor -- I would be willing to bet you'll find the extra "empty" row there too.
                          Last edited by Hemanshu Kumar; 21 Apr 2025, 06:14.

                          Comment


                          • #14
                            However, can you tell me why Stata was creating a new/additional row with all missing values?
                            I believe that this has already been answered. Stata is not doing that -- it is just responding to what was in the Excel file. I've found that (or it seems that) you only need to let a mouse rest in a cell for Excel to think that the correspinding row [Excel sense] exists (and therefore all rows above).

                            Your Stata data example included an extra observation with missing values. Nothing in this thread. or any other material. suggests that the log transformation in Stata creates a new observation.

                            This is consistent with your experience that direct copy and paste doesn't create the problem -- because you wouldn't deliberately include rows in the spreadsheet that to you contain no information in what is copied and pasted.
                            Last edited by Nick Cox; 21 Apr 2025, 06:25.

                            Comment

                            Working...
                            X