Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Lagged values, ffiling entries and move values

    Hi everybody,
    this is the dataset I have:


    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long permno float date byte(dunemp dinflation dfomc duninfl dunfomc dinflfomc dall) float(MEt Exret) double firstbeta int fyear float(BE month day year)
    10000 9502 0 0 0 0 0 0 0         .        .                  . . . 1  6 1986
    10000 9503 0 0 0 0 0 0 0      9430        .                  . . . 1  7 1986
    10000 9504 1 0 0 1 1 0 1      9200  -.02464                  . . . 1  8 1986
    10000 9505 0 0 0 0 0 0 0      9200  -.00025                  . . . 1  9 1986
    10000 9506 0 1 0 1 0 1 1      9200  -.00025 1.1092484682756043 . . 1 10 1986
    10000 9509 0 0 0 0 0 0 0      9660   .04975  2.298013031125736 . . 1 13 1986
    10000 9510 0 0 0 0 0 0 0     10120  .047369 2.6404900943526526 . . 1 14 1986
    10000 9511 0 0 0 0 0 0 0     10580  .045205 2.5170627711997664 . . 1 15 1986
    10000 9512 0 0 0 0 0 0 0     11040  .043228 2.5458446861374435 . . 1 16 1986
    10000 9513 0 0 0 0 0 0 0     11040  -.00025  2.515478872206193 . . 1 17 1986
    10000 9516 0 0 0 0 0 0 0     11040  -.00025 2.5479645918335634 . . 1 20 1986
    10000 9517 0 0 0 0 0 0 0     11040  -.00025 2.5900722513319483 . . 1 21 1986
    10000 9518 0 0 0 0 0 0 0     11040  -.00025  2.590091307954286 . . 1 22 1986
    10000 9519 0 0 0 0 0 0 0     13800   .24975  4.238323536310995 . . 1 23 1986
    10000 9520 0 0 0 0 0 0 0     15410  .116417   4.61989046462943 . . 1 24 1986
    10000 9523 0 0 0 0 0 0 0     16330  .059451  4.542009632838356 . . 1 27 1986
    10000 9524 0 0 0 0 0 0 0     16330  -.00025  3.577983687689484 . . 1 28 1986
    10000 9525 0 0 0 0 0 0 0     15870 -.028419  3.345643072886398 . . 1 29 1986
    10000 9526 0 0 0 0 0 0 0     16330  .028736 3.3384033787335996 . . 1 30 1986
    10000 9527 0 0 0 0 0 0 0     16100 -.014335 2.6629858217408513 . . 1 31 1986
    10000 9530 0 0 0 0 0 0 0     16100  -.00028  2.242670046026241 . . 2  3 1986
    10000 9531 0 0 0 0 0 0 0     16100  -.00028 2.2925257893783257 . . 2  4 1986
    10000 9532 0 0 0 0 0 0 0     16100  -.00028 2.2781528858670246 . . 2  5 1986
    10000 9533 0 0 0 0 0 0 0     15410 -.043137  2.134287139225645 . . 2  6 1986
    10000 9534 1 0 0 1 1 0 1     16100  .044496  2.171270512479075 . . 2  7 1986
    10000 9537 0 0 0 0 0 0 0     15870 -.014566 1.9334764408724603 . . 2 10 1986
    10000 9538 0 0 0 0 0 0 0     15870  -.00028  1.939770609657658 . . 2 11 1986
    10000 9539 0 0 0 0 0 0 0     15525 -.022019  1.915189020778588 . . 2 12 1986
    10000 9540 0 0 1 0 1 1 1     15525  -.00028 1.8056945274917433 . . 2 13 1986
    10000 9541 0 0 0 0 0 0 0     15755  .014535 1.6968127687120975 . . 2 14 1986
    10000 9545 0 0 0 0 0 0 0     14720 -.065973 1.0337260855333557 . . 2 18 1986
    10000 9546 0 0 0 0 0 0 0     14490 -.015905  1.171833119512883 . . 2 19 1986
    10000 9547 0 0 0 0 0 0 0     13570 -.063772  .8063214646307302 . . 2 20 1986
    10000 9548 0 0 0 0 0 0 0     13570  -.00028  .7088089286924604 . . 2 21 1986
    10000 9551 0 0 0 0 0 0 0     13340 -.017229  .7530157271945829 . . 2 24 1986
    10000 9552 0 0 0 0 0 0 0     13110 -.017521    .80927400318237 . . 2 25 1986
    10000 9553 0 0 0 0 0 0 0     11960 -.087999  .8339709423299075 . . 2 26 1986
    10000 9554 0 0 0 0 0 0 0     11960  -.00028  .7471801568728254 . . 2 27 1986
    10000 9555 0 0 0 0 0 0 0     11960  -.00028   .746391023098367 . . 2 28 1986
    10000 9558 0 0 0 0 0 0 0     11960   -.0003  .7570288849195854 . . 3  3 1986
    10000 9559 0 0 0 0 0 0 0     12650  .057392  .6768527315804056 . . 3  4 1986
    10000 9560 0 0 0 0 0 0 0     12765  .008791  .6723175515926605 . . 3  5 1986
    10000 9561 0 0 0 0 0 0 0     14260  .116817  .8185619060034663 . . 3  6 1986
    10000 9562 1 0 0 1 1 0 1     16330  .144861  .8133942512517864 . . 3  7 1986
    10000 9565 0 0 0 0 0 0 0     15410 -.056638  .7077004461695315 . . 3 10 1986
    10000 9566 0 0 0 0 0 0 0     15870  .029551  .7431362334910612 . . 3 11 1986
    10000 9567 0 0 0 0 0 0 0     15870   -.0003  .7255980446150595 . . 3 12 1986
    10000 9568 0 0 0 0 0 0 0     16330  .028686  .7257069173776578 . . 3 13 1986
    10000 9569 0 1 0 1 0 1 1     16330   -.0003  .6557522487622843 . . 3 14 1986
    10000 9572 0 0 0 0 0 0 0     16330   -.0003  .6833345003259159 . . 3 17 1986
    10000 9573 0 0 0 0 0 0 0     16330   -.0003  .6665680318182341 . . 3 18 1986
    10000 9574 0 0 0 0 0 0 0     16330   -.0003  .6821551042998169 . . 3 19 1986
    10000 9575 0 0 0 0 0 0 0     16330   -.0003  .6710956362213673 . . 3 20 1986
    10000 9576 0 0 0 0 0 0 0     16330   -.0003  .6925084577181038 . . 3 21 1986
    10000 9579 0 0 0 0 0 0 0     16330   -.0003  .6819270071988135 . . 3 24 1986
    10000 9580 0 0 0 0 0 0 0     16330   -.0003  .6967366137751768 . . 3 25 1986
    10000 9581 0 0 0 0 0 0 0     16330   -.0003   .644206038839203 . . 3 26 1986
    10000 9582 0 0 0 0 0 0 0     16215 -.007342  .6015833880957242 . . 3 27 1986
    10000 9586 0 0 0 0 0 0 0     16330  .006792  .6031738708775016 . . 3 31 1986
    10000 9587 0 0 0 0 0 0 0     15870 -.028409   .753663032811027 . . 4  1 1986
    10000 9588 0 0 1 0 1 1 1     15870  -.00024  .7563925380424335 . . 4  2 1986
    10000 9589 0 0 0 0 0 0 0     15870  -.00024  .7606684775922606 . . 4  3 1986
    10000 9590 1 0 0 1 1 0 1     15870  -.00024  .7491797652544406 . . 4  4 1986
    10000 9593 0 0 0 0 0 0 0     15640 -.014733  .7775320796436982 . . 4  7 1986
    10000 9594 0 0 0 0 0 0 0     15180 -.029652  .5158959110292793 . . 4  8 1986
    10000 9595 0 0 0 0 0 0 0     15525  .022487  .5183757291944111 . . 4  9 1986
    10000 9596 0 0 0 0 0 0 0     15295 -.015055  .4552078201849054 . . 4 10 1986
    10000 9597 0 1 0 1 0 1 1     15180 -.007759 .46499216533997756 . . 4 11 1986
    10000 9600 0 0 0 0 0 0 0     15065 -.007816 .44700151566497554 . . 4 14 1986
    10000 9601 0 0 0 0 0 0 0     15065  -.00024  .4474550935339181 . . 4 15 1986
    10000 9602 0 0 0 0 0 0 0     15065  -.00024  .3864226015052452 . . 4 16 1986
    10000 9603 0 0 0 0 0 0 0     14950 -.007874  .3804404167195379 . . 4 17 1986
    10000 9604 0 0 0 0 0 0 0     14950  -.00024  .3853140140302527 . . 4 18 1986
    10000 9607 0 0 0 0 0 0 0     14950  -.00024 .37111661654700334 . . 4 21 1986
    10000 9608 0 0 0 0 0 0 0     14950  -.00024 .38017850151959354 . . 4 22 1986
    10000 9609 0 0 0 0 0 0 0     14950  -.00024 .38749844929549093 . . 4 23 1986
    10000 9610 0 0 0 0 0 0 0     14950  -.00024 .38729041569503636 . . 4 24 1986
    10000 9611 0 0 0 0 0 0 0     14950  -.00024  .3899418931550269 . . 4 25 1986
    10000 9614 0 0 0 0 0 0 0     15065  .007452  .3899482330939451 . . 4 28 1986
    10000 9615 0 0 0 0 0 0 0     14950 -.007874 .41745736301004216 . . 4 29 1986
    10000 9616 0 0 0 0 0 0 0     15172 -.015625 .47779987837270305 . . 4 30 1986
    10000 9617 0 0 0 0 0 0 0     15172  -.00023 .48128432425689033 . . 5  1 1986
    10000 9618 1 0 0 1 1 0 1  15053.47 -.008043 .48651269267029884 . . 5  2 1986
    10000 9621 0 0 0 0 0 0 0  14757.16 -.019915 .42229431037894466 . . 5  5 1986
    10000 9622 0 0 0 0 0 0 0 14460.813  -.02031  .4355063788245779 . . 5  6 1986
    10000 9623 0 0 0 0 0 0 0  14223.75 -.016623   .458035118274006 . . 5  7 1986
    10000 9624 0 0 0 0 0 0 0  14223.75  -.00023 .45243000538632144 . . 5  8 1986
    10000 9625 0 0 0 0 0 0 0  14342.28  .008103 .45323803161974063 . . 5  9 1986
    10000 9628 0 0 0 0 0 0 0  16001.72  .115472  .3815510980908785 . . 5 12 1986
    10000 9629 0 0 0 0 0 0 0 15527.594  -.02986  .4187316146539895 . . 5 13 1986
    10000 9630 0 0 0 0 0 0 0 15764.656  .015037  .4230301085376582 . . 5 14 1986
    10000 9631 0 0 0 0 0 0 0  15290.53 -.030305  .4989900217899599 . . 5 15 1986
    10000 9632 0 1 0 1 0 1 1  15053.47 -.015734  .5230010360575719 . . 5 16 1986
    10000 9635 0 0 0 0 0 0 0 14697.875 -.023852  .5253704058640784 . . 5 19 1986
    10000 9636 0 0 0 0 0 0 0 14697.875  -.00023  .5073523267899698 . . 5 20 1986
    10000 9637 0 0 1 0 1 1 1 13868.156 -.056682  .5392347670915105 . . 5 21 1986
    10000 9638 0 0 0 0 0 0 0 13038.438 -.060059   .335202959975203 . . 5 22 1986
    10000 9639 0 0 0 0 0 0 0  13156.97  .008861  .3381504815493877 . . 5 23 1986
    10000 9643 0 0 0 0 0 0 0  13156.97  -.00023 .32177334053140066 . . 5 27 1986
    10000 9644 0 0 0 0 0 0 0 12682.844 -.036266 .28011880935371014 . . 5 28 1986
    end
    format %td date



    I need to do the following: on 30th of June of every year (for the variable date; I have daily data), I need to create a new variable where (for every permno) I take the value of BE for which fyear (fiscal year) is equal to the value of year-1 if fyear is the same as year. For example, for the day 30jun1997, I need the value of BE which is contained where fyear is 1996 if year==1996 for the same permno.
    Next, I need to make the value of this new variable the same for one year for every permno, for example with the previous case: the newvariable will have the same value as for 30jun1997 from 01july1197 until 29june1998.
    I come from other programming languages where I can switch values from one entry to another with another logic, but in the first step I am completely lost. As for the second step, I have a guess about using something like

    by permno year (date), sort: egen newvar=BE[_181]

    to identify the 30th of June of every year but I am not sure about its validty, especially because there are some leap year (I think in that case

    by permno year (date), sort: egen newvar=BE[_181]
    by permno year (date), sort: egen newvar=BE[_182] if _N==366

    would do the job. Correction: this doe not work since I have gaps in dates

    Thank you in advance for your help
    Last edited by Matteo Bagnara; 20 Jul 2018, 09:08.

  • #2
    I'm completely confused by your explanation. Moreover, in your example data, variable BE has all missing values, as does fyear.

    I suggest posting back with a different set of example data that you can actually do this calculation for, and that you try to explain what you want more clearly, and also include in your example a variable that contains your hand-worked calculation of what it should look like. (And if variables like dunemp, dinflation, etc. are irrelevant to this calculation, you can leave those out of your -dataex-.)

    Comment


    • #3
      Hi Clyde,
      here you find a more clean example of data:
      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input long permno float(date Exret) double firstbeta int fyear float BE
      10000 9502        .                  . . .
      10000 9503        .                  . . .
      10000 9504  -.02464                  . . .
      10000 9505  -.00025                  . . .
      10000 9506  -.00025 1.1092484682756043 . .
      10000 9509   .04975  2.298013031125736 . .
      10000 9510  .047369 2.6404900943526526 . .
      10000 9511  .045205 2.5170627711997664 . .
      10000 9512  .043228 2.5458446861374435 . .
      10000 9513  -.00025  2.515478872206193 . .
      10000 9516  -.00025 2.5479645918335634 . .
      10000 9517  -.00025 2.5900722513319483 . .
      10000 9518  -.00025  2.590091307954286 . .
      10000 9519   .24975  4.238323536310995 . .
      10000 9520  .116417   4.61989046462943 . .
      10000 9523  .059451  4.542009632838356 . .
      10000 9524  -.00025  3.577983687689484 . .
      10000 9525 -.028419  3.345643072886398 . .
      10000 9526  .028736 3.3384033787335996 . .
      10000 9527 -.014335 2.6629858217408513 . .
      10000 9530  -.00028  2.242670046026241 . .
      10000 9531  -.00028 2.2925257893783257 . .
      10000 9532  -.00028 2.2781528858670246 . .
      10000 9533 -.043137  2.134287139225645 . .
      10000 9534  .044496  2.171270512479075 . .
      10000 9537 -.014566 1.9334764408724603 . .
      10000 9538  -.00028  1.939770609657658 . .
      10000 9539 -.022019  1.915189020778588 . .
      10000 9540  -.00028 1.8056945274917433 . .
      10000 9541  .014535 1.6968127687120975 . .
      10000 9545 -.065973 1.0337260855333557 . .
      10000 9546 -.015905  1.171833119512883 . .
      10000 9547 -.063772  .8063214646307302 . .
      10000 9548  -.00028  .7088089286924604 . .
      10000 9551 -.017229  .7530157271945829 . .
      10000 9552 -.017521    .80927400318237 . .
      10000 9553 -.087999  .8339709423299075 . .
      10000 9554  -.00028  .7471801568728254 . .
      10000 9555  -.00028   .746391023098367 . .
      10000 9558   -.0003  .7570288849195854 . .
      10000 9559  .057392  .6768527315804056 . .
      10000 9560  .008791  .6723175515926605 . .
      10000 9561  .116817  .8185619060034663 . .
      10000 9562  .144861  .8133942512517864 . .
      10000 9565 -.056638  .7077004461695315 . .
      10000 9566  .029551  .7431362334910612 . .
      10000 9567   -.0003  .7255980446150595 . .
      10000 9568  .028686  .7257069173776578 . .
      10000 9569   -.0003  .6557522487622843 . .
      10000 9572   -.0003  .6833345003259159 . .
      10000 9573   -.0003  .6665680318182341 . .
      10000 9574   -.0003  .6821551042998169 . .
      10000 9575   -.0003  .6710956362213673 . .
      10000 9576   -.0003  .6925084577181038 . .
      10000 9579   -.0003  .6819270071988135 . .
      10000 9580   -.0003  .6967366137751768 . .
      10000 9581   -.0003   .644206038839203 . .
      10000 9582 -.007342  .6015833880957242 . .
      10000 9586  .006792  .6031738708775016 . .
      10000 9587 -.028409   .753663032811027 . .
      10000 9588  -.00024  .7563925380424335 . .
      10000 9589  -.00024  .7606684775922606 . .
      10000 9590  -.00024  .7491797652544406 . .
      10000 9593 -.014733  .7775320796436982 . .
      10000 9594 -.029652  .5158959110292793 . .
      10000 9595  .022487  .5183757291944111 . .
      10000 9596 -.015055  .4552078201849054 . .
      10000 9597 -.007759 .46499216533997756 . .
      10000 9600 -.007816 .44700151566497554 . .
      10000 9601  -.00024  .4474550935339181 . .
      10000 9602  -.00024  .3864226015052452 . .
      10000 9603 -.007874  .3804404167195379 . .
      10000 9604  -.00024  .3853140140302527 . .
      10000 9607  -.00024 .37111661654700334 . .
      10000 9608  -.00024 .38017850151959354 . .
      10000 9609  -.00024 .38749844929549093 . .
      10000 9610  -.00024 .38729041569503636 . .
      10000 9611  -.00024  .3899418931550269 . .
      10000 9614  .007452  .3899482330939451 . .
      10000 9615 -.007874 .41745736301004216 . .
      10000 9616 -.015625 .47779987837270305 . .
      10000 9617  -.00023 .48128432425689033 . .
      10000 9618 -.008043 .48651269267029884 . .
      10000 9621 -.019915 .42229431037894466 . .
      10000 9622  -.02031  .4355063788245779 . .
      10000 9623 -.016623   .458035118274006 . .
      10000 9624  -.00023 .45243000538632144 . .
      10000 9625  .008103 .45323803161974063 . .
      10000 9628  .115472  .3815510980908785 . .
      10000 9629  -.02986  .4187316146539895 . .
      10000 9630  .015037  .4230301085376582 . .
      10000 9631 -.030305  .4989900217899599 . .
      10000 9632 -.015734  .5230010360575719 . .
      10000 9635 -.023852  .5253704058640784 . .
      10000 9636  -.00023  .5073523267899698 . .
      10000 9637 -.056682  .5392347670915105 . .
      10000 9638 -.060059   .335202959975203 . .
      10000 9639  .008861  .3381504815493877 . .
      10000 9643  -.00023 .32177334053140066 . .
      10000 9644 -.036266 .28011880935371014 . .
      end
      format %td date
      I try to explain more thoroughla what I need to do. I have a measure (BE) of Book Equity for stocks once per year only. This measures is available for different permnos (the identifier of the dataset) in different dates, which might also change within the same permno: for example for permno 10003, it can be available on 31oct1986 for the year 1986 but on 30march1999 for the year 1999. So basically BE is always missing except for one day per year, which might change across years even for the same permno. The same applies for fyear (which denotes the fiscal year)
      Now: for every day within a certain period (within the same permno), I want to have exactly the same value of BE which belongs to that year: for example, I want that for year 1986 all the daily values of BE are the same as its value for that year (from before, the one of 31oct1986).
      Now the problem becomes for the period for which I need to do this: every BE value must be the same from July of the following year to end of June of 2 years ahead, if the fyear is equal to the year of date. For example: for the BE of 31oct1986, if fyear=year of date(1986), I want all the daily values of BE starting from 01july1987 until 30july1988 to be equal to the value of BE on 31oct1986. If fyear~=year of date, then I need BE from 01July1987 to 30June1988 beging equal to the first non-missing value of BE prior to 31oct1986 (for example it could be 31oct1985).
      I hope now it's more clear

      Comment


      • #4
        I am a bit clearer on this and perhaps I understand it. Your explanation about year = fyear seems, to me, excessively complicated for what you want. It appears that if you boil it down, it doesn't really matter whether year and fyear agree. Either way, for each July 1 to following June 30 period, you want the most recent available values of fyear and BE, regardless of how far back one might have to go to find them. If this is correct, then I believe the code below works. I had to make up a data set to demonstrate and test it because your new example still contains only missing values for fyear and BE, and it also has no dates between July 1 and June 30 of any year.

        Code:
        //    CREATE A TOY DATA SET TO DEMONSTRATE THE CODE
        clear*
        set obs `=730*2'
        gen permno = 10000 if _n <= 730
        replace permno = 10001 if missing(permno)
        by permno, sort: gen date = td(31dec1985) + _n
        format date %td
        gen fyear = 1986 in 90
        replace fyear = 1987 in `=90+365'
        replace fyear = 1987 in `=90+730'
        replace fyear = 1988 in `=90+730+366'
        gen BE = 50 in 90
        replace BE = 100 in `=90+365'
        replace BE = 45 in `=90+730'
        replace BE = 75 in `=90+730+366'
        
        
        //    CODE FOR POSSIBLE SOLUTION BEGINS HERE
        
        //    VERIFY ASSUMPTIONS
        isid permno date    // PERMNO & DATE UNIQUELY IDENTIFY OBS
        gen year = year(date)
        by permno year (date), sort: egen fyear_count = count(fyear)
        by permno year: egen be_count = count(BE)
        assert fyear_count == 1 & be_count == 1
        drop fyear_count be_count
        
        //    IDENTIFY PERIODS FROM 1JUL TO SUBSEUQNET 30 JUN
        by permno (date), sort: gen int new_period = ///
            (month(date) >= 7 & month(date[_n-1]) < 7) | (_n == 1)
        by permno (date): gen period = sum(new_period)
        
        
        capture program drop one_period
        program define one_period
            sort date
            collapse (lastnm) wanted_fyear = fyear ///
                wanted_BE = BE
            exit
        end
        
        //    SET UP INTERVALS FOR -rangerun-
        by permno period (date), sort: gen high = date[1]-1 if _n == 1
        by permno (date), sort: gen low = date[1] if !missing(high)
        replace high = -1 if missing(high)
        replace low = 1 if missing(low)
        format high low %td
        
        
        
        //    GET THE MOST RECENT FYEAR AND BE FOR THE
        //    FIRST DAY OF EACH PERIOD
        rangerun one_period, by(permno) interval(date low high)
        
        //    AND SPREAD THE RESULTS TO THE FULL PERIOD
        by permno period (date), sort: replace wanted_fyear = wanted_fyear[1]
        by permno period (date): replace wanted_BE = wanted_BE[1]
        To run this code you will need the -rangerun- command, by Robert Picard, available from SSC. And to use -rangerun- you will also need the -rangestat- command, by Robert Picard, Nick Cox, and Roberto Ferrer, also available from SSC.

        I have made the code a bit more complicated than it needs to be. The creation of the high and low variables was designed so that range-run would find the appropriate values of fyear and BE only for the observations corresponding to July 1. This was done because the results could then just be copied into the rest of the July 1-June30 period without recalculating them. In a large data set this saves a noticeable amount of time compared to doing the calculation for every date.

        Note: This code does not require that there be an actual observation for July 1 in each year. If there is, that's fine. But if July 1 is missing (perhaps it falls on a weekend), the code will simply start the next period in the first date beyond July 1 that is present in the data for that permno and year.

        I hope this is what you want.

        Comment


        • #5
          Hi Clyde,

          first let me thank you a lot for your solution, which is really much appreciated.
          Unfortunately the equality between fyear and year matters a lot in this case. Year represents the calendat year of the observation, wherease fyear represents the fiscal year of the accounting variable. These two values typically are not the same for those firms who report the BE variable early in the year (let's say for months from January up to May). In that case, Fro mJuly 1st of every year until the next 30th of June, I want to put in a variable the value of BE of the previous year, if year==fyear. If not, I have to take the value of BE of two years (of fyear) before as exaplained already above. But you are absolutely write aboyt the fact that I can boil it done to write. for every 30th of June of year t, I take the BE value of the year t-1 in any case. The rule I have to use is in fact: "in every 30th June (date), take the BE value whose fiscal year (fyear) ends in the calendar year (the year contained in date) minus one". So actually even though fyear<year, I will still take that value.
          Now the provlem comes because sometimes there are two different dates for the same year that contain a value of BE. And here is where the rule comes into play. Let's say I have a non-missing value of BE for 31march1982 with fyear=1981 and 31dec1982 with fyear=1982. From July 1st 1983 until June 30th 1984 the right value is BE of 31 decemeber (becauseI take the value of BE whose fiscal year end ion calendar year-1, i.e. fyear=1982. For the period jule1982-june1983 the right value will be instead the BE of the hypothetical march (but could be another month) 1981. So if your code does this, it is correct.

          However, I was thinking to structure the solution in a different way and see if the results are the same. Basically I have on average only one value of BE per year (and sometimes two if the company is closing the following year). So I thought I could: collapse the dataset to those observations which contain only non-missing values of BE. Then, according to the rule given above, I would create a new variable date that assigns every BE to the 1st of July of the right year. Then, I would merge drop the initial date variable and keep the new one. Then, I would merge this collapsed dataset (of course keeping only permno, date , the unique identifiers, and BE) to the initial one. Finally, I would use you portion of code to generate a period that goes from 1st of July to 30th of June and i would simply expand the value of BE of the 1st of July to the whole new period. Is it the same process? Of course this solution is not as elegant as your but I think it might work.

          Comment


          • #6
            I believe my code handles the case you mention correctly. Try running it with a suitable data set and verify that for yourself, but I'm pretty sure it does.

            Your alternative approach sounds like it wold produce equivalent results, though you would have to actually code it and test it to be sure: the devil is always in the details.

            Comment


            • #7
              You're absolutely write about that, details make a huge difference. At the moment I cannot access the database and thus cannot test the code, but I tried to come up with an alternative solution because my dataset is very big and contains a lot of obserations ( we are talking about around 4gb of size with a total of 72 mln observations). I believe your solution of introducing high and low variables to save time is actaully very helpful in this case, but I cannot say how long it will take. So I will try both solutions and see possible differences and time required and write about it so it can be useful for other users.

              Comment

              Working...
              X