Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Converting Percentage Growth Rates to Level Values

    Hey everyone. I have real econ question this time! Okay, so I wanna reproduce figure 4, panel B of this paper (see page 21 if interested) It involves transforming monthly percentage growth rates to their real values (in this case, the product we're concerned with is luxury watch imports). The dataset exists in R (don't worry, I've tidied up the basics), and I present the relevant variable below.
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long(id month) float(product treated)
    1 601    -.3059372 0
    1 602     .5280703 0
    1 603    -.2787752 0
    1 604    .04519186 0
    1 605   -.02752761 0
    1 606    .14188136 0
    1 607   -.09538406 0
    1 608    .22210488 0
    1 609     .1210361 0
    1 610     .1762661 0
    1 611   -.04707132 0
    1 612     -.206153 0
    1 613    .09640702 0
    1 614    .25957444 0
    1 615  -.019006895 0
    1 616    .05884018 0
    1 617    .05663636 0
    1 618    .16076256 0
    1 619    -.3387509 0
    1 620     .1599616 0
    1 621     .2590707 0
    1 622    -.1175653 0
    1 623     .0823155 0
    1 624   -.57728165 0
    1 625     .2887118 0
    1 626    .19027576 0
    1 627   -.04159575 0
    1 628    .07965107 0
    1 629     -.157954 0
    1 630    -.0904241 0
    1 631    -.0689114 0
    1 632    .14127608 0
    1 633    .07703104 0
    1 634   -.00528429 0
    1 635   -.13722466 0
    1 636    -.4204597 1
    1 637    -.1742234 1
    1 638    .16987443 1
    1 639 -.0021465882 1
    1 640      .179711 1
    1 641    -.0536652 1
    1 642   -.01583692 1
    1 643   -.15609664 1
    1 644   -.13580869 1
    1 645     .2965978 1
    1 646   -.06018283 1
    1 647    .24309935 1
    1 648    -.5410025 1
    1 649     .2207956 1
    1 650    -.1882118 1
    1 651    .08216941 1
    1 652    .06764874 1
    1 653   -.18822004 1
    1 654     .5548315 1
    1 655    -.4751124 1
    1 656   .034530647 1
    1 657    .12505059 1
    1 658   -.14233308 1
    1 659     .1529133 1
    1 660    -.1231552 1
    1 661   -.12265544 1
    1 662    .21872626 1
    1 663    .27011153 1
    1 664     -.261104 1
    1 665   -.08026836 1
    1 666     .0892446 1
    1 667   -.29590556 1
    1 668   -.03112456 1
    1 669     .4658253 1
    1 670    .05001443 1
    1 671    -.3329245 1
    end
    format %tm month
    label values id id
    label def id 1 "Product1", modify
    
    xtset id month, m
    Here's where the problems begin. In R, the way the authors transform the growth rate to the total import value is
    Code:
    exp(cumsum(c(result$in_sample$observation, result$out_of_sample$observation)))
    which is just fancy schmancy R code for
    Code:
    qbys id: replace product = exp(sum(product))
    Well, when we do this though, something interesting happens. R and Stata both produce identical values, however, in the paper, the lower bound of the Y axis is 20 (20 million dollars). the lower bound in both R and Stata is 0.736! This suggests one of two things: either the authors made an error in their calculations, or, there's some specific way of converting from growth rates to levels that they didn't put in their code. Is there another way that one might use to convert monthly growth rates to the levels value, because 0.736 is quite far from 20.


    P.S.: Yes, I've emailed Zhentao about this, I was just curious if anyone else might have a better explanation.

  • #2
    What you have calculated is the cumulative growth. You need to multiply this series by the real value in the month that corresponds to the denominator in the calculation of the growth rate for 2010m2 in order to have it represent real values.

    Put another way, if you know something grew 10% in 2020 and 10% in 2021, the cumulative growth is 1.10*1.10 = 1.21. If the real value at the start of 2020 was $100 then the real value at the end of 2021 will be $100 * 1.21.

    Added in edit. The second paragraph was an afterthought. I'm not sure I agree that what you show calculates cumulative growth. I would think it would be
    Code:
    exp(sum(log(1+product)))
    but perhaps they are approximating log(1+product) by product for smallish values of product.

    Whatever, to translate growth as a percentage to grown value you need to start with an actual value.

    Added in edit #2:
    because 0.736 is quite far from 20
    And looking carefully at the Y axis in panel B, we see that 0.736 is tremendously far from $20M.
    Last edited by William Lisowski; 30 Aug 2022, 12:23.

    Comment


    • #3
      but perhaps they are approximating log(1+product) by product for smallish values of product.
      I was thinking the same thing. But the values of product range from about -.577 to +.555. log(1 - .577) = -.86, and log(1+.555) = 0.44. Those are terrible approximations. The values of product in this data are simply not small enough for that approximation formula to be useful.

      I continue to be amazed that in the 21st century (or even the late 20th) anybody uses these approximation formulas when programming a digital computer. Unless you are buried deep inside multiple loops that will iterate trillions of times, the time savings from using them are not noticeable compared to just directly calculating logarithms and exponentials using the well-honed library functions that all reputable software packages rely on. And the approximations are only sufficiently accurate for practical purposes over a very narrow range of argument values.

      Comment


      • #4
        When Zhentao gets back with me, I'll post what he actually did. Personally, I'm not a fan, putting it very politely, when researchers do stuff like this. Like, you already do the first part right, by posting the data and code publicly for everyone to see. But then, when I work with the replication code, the results don't come back at all as they appear in the paper, and the discrepancies aren't "Oh we multiplied this by 100, just to scale it", there's no real reason (that I can think of) for why this discrepancy exists. To me, any transformation of the final variables, any slight change, any other variables needed to produce the same numbers should be present in the data such that we can work with them in the do file or R script. That way, it's super super clear on what you did when, where and how.

        I believe it was you, Clyde Schechter, who once said that there's no such thing as a small change in code, and that's pretty true!

        Comment

        Working...
        X