Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Indicator variable for whether an observation is in the top tercile of the distribution of a certain variable for that year

    Hello!

    I am trying to replicate the measure used in van Binsbergen, Graham, Yang 2010. Specifically, it is this conglomerate indicator UNFC which is equal to 1 if any of the 4 variables are in the top tercile of their distribution for that year.

    Click image for larger version

Name:	Screenshot.jpg
Views:	1
Size:	36.1 KB
ID:	1717195

    In my data, these 4 variables are named scaleddltis scaleddltr scaledsstk scaledprstkc.

    Would the correct code be something like:

    Code:
    gen ltdii = 0
    gen ltdri = 0
    gen eii = 0
    gen eri = 0
    
    forvalues i = 1995/2022 {
    summarize scaleddltis if fyear == `i', detail
    replace ltdii = 1 if scaleddltis > `r(p66)' 
    sum scaleddltis if fyear == `i', detail
    replace ltdri= 1 if scaleddltis > `r(p66)' 
    sum scaleddltis if fyear == `i' , detail
    replace eii= 1 if scaleddltis > `r(p66)' 
    sum scaleddltis if fyear == `i' , detail
    replace eri= 1 if scaleddltis > `r(p66)' 
    }
    
    gen unfc = 0
    replace unfc = 1 if ( ltdii == 1 | ltdri == 1 | eii == 1 | eri== 1 )
    I'm currently getting the error "1995 invalid name"

    Dataex:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long gvkey double(fyear zscore scaleddltis scaleddltr scaledsstk scaledprstkc)
    1004 1991  2.1450351712781806  .01808884139825981                    0                      0                      0
    1004 1992  2.0320177132200103                   0 .0025420449170483945                      0   .0029442191875067975
    1004 1993  2.0943815758597406  .14562194818034183   .07722832472045811                      0   .0001807471429627743
    1004 1994  2.2111736579821235 .014812296169299804 .0025980183226140137                      0   .0004238241871914105
    1004 1995    2.38412683911695                   0 .0038326593301300565                      0     .00364478387277074
    1004 1996    2.35436625728874                   0 .0033664804520310795                      0    .018453976969071317
    1004 1997   2.297589772115504  .11206343091936313   .01920375237922596   .0031005468443155385                      0
    1004 1998  2.4312467142837484 .003061624704164734                    0                      0    .011271193138858774
    1004 1999  2.5454136178505204                   0 .0006660886558496071                      0    .014491556913422236
    1004 2000   2.364065460907824                   0 .0006464254964250915                      0   .0002847511059409067
    1004 2003  1.6965654483625923  .13064121254665964   .13779799918732458                      0                      0
    1004 2004  1.9066510522649982                   0    .0338436074282524                      0                      0
    1004 2005  1.8679551582059604  .21987626838561652  .027827322016306354                      0                      0
    1004 2006  2.0191681036461033 .031011862254410674  .020881286529991758                      0                      0
    1004 2007  2.1228811829575407   .2621528184310526   .09422619945243356    .005778202809392365    .008923478386299411
    1004 2008  2.2331145812991693  .05530282450202274   .08242230233258199   .0004397911909604188                      0
    1004 2009  1.8353407832692226  .04613030313369549  .018570450617091262    .001667500295823409                      0
    1004 2010   1.987343570888998                   0   .04218935912519437   .0013463980354980075   .0016914916438047704
    1004 2011  1.7180983060620234  .24080383770404531    .0530090795062824   .0017708236119988706    .002147644546338703
    1004 2012   1.858467874023117  .08279996884753649   .12492866586842276   .0005009899105186476    .006649502448702049
    1004 2013  1.8657103887247102                   0   .04160232111937854   .0023866348448687348  .00046796761664092844
    1004 2015  2.3599542334096113                   0    .0466006600660066                      0     .01240924092409241
    1004 2016   2.420098397712918                   0 .0069343318771236395                      0    .013729977116704806
    1004 2017   2.437213878139962 .016488265407885117                    0     .00771225317465594    .008709527292068347
    1004 2018  2.6337727392565253                   0  .016396668197022363    .005574867186987604   .0067554272971732145
    1004 2019   2.107888407888408                   0                    0   .0007250197732665437   .0027023464276298443
    1004 2020   2.336331752938884                   0                    0  .00030066150182537555                      0
    1004 2021   2.589649914225808                   0  .016829821461581445    .011447004071035152    .028890057893564906
    1009 1992   1.774940442157423  .07171180291909221   .10717399251947468                      0                      0
    1009 1993  1.6920808787911934   1.406756241661902   1.0208452449018486                      0                      0
    1009 1994  1.5149065674601057   1.317171117396128    .9865774958201166                      0                      0
    1013 1992  3.0657213347621295                   0   .12385048286799719    .008140179391428536                      0
    1013 1993   3.200715219207724                   0   .05534095912145604     .02390742725180884                      0
    1013 1994  3.3914937672550813                   0 .0010712219786184094    .014861419583366066                      0
    1013 1995  2.8169931606783094                   0 .0011951572229326767       .569704557134491                      0
    1013 1996   2.902296670634069                   0  .008146961401337255    .019458211262005413                      0
    1013 1997   3.032799318169439                   0  .006866857882447822    .012401709234941759                      0
    1013 1998  2.7476002758754317                   0 .0002819600065363456    .021244191250054736                      0
    1013 1999   2.603528130154993                   0   .13139912977755427     .03475815151158669                      0
    1013 2000   3.178456113839567                   0  .019431651110384335      .1657370365476473                      0
    1013 2004   .3418808206708213                   0   .00825044336494718   .0028529570514303337                      0
    1013 2005   .8123061889250816                   0                    0    .009523142637070233                      0
    1013 2006   .9265917835422615                   0                    0    .006254071661237785                      0
    1013 2007   .9748526745240255                   0                    0   .0029787762194365144                      0
    1013 2010   .7705120379789758                   0 .0005209883894016076                      0                      0
    1017 1991   2.910232967267299 .010988824923806298   .07301896376566204 .000050795800880460546                      0
    1017 1992  1.9008896952357695                   0  .006468351281231119  .00023989764367203325    .007925507339090877
    1017 1993  2.1921057935431856                   0    .0608656211293896     .00263419569806186    .001953840517767737
    1017 1994  2.4209055591709556                   0  .050771373080728495   .0032247425702664075   .0021070761112536186
    1021 1993  1.6067546754675464                   0  .041030236907730666                      0                      0
    1021 1995  2.8421289228159456  .01658374792703151    .3044776119402985                      0                      0
    1021 1996   2.557503208608945   .4735368956743003   .04681933842239187     .21713316369804922                      0
    1021 1997  2.5536849288360304                   0    .2303781222233192   .0005923585743903643                      0
    1021 1998   1.457574620867049 .060489374147007224   .04357574575940729    .012331838565022422                      0
    1034 1991   1.312599002313073  .06431452533524101  .013854092147521271    .010408073182240216    .011443096546370568
    1034 1992  1.4489597120444202  .07803137976200429   .04019452108240671    .003505838419307193      .0018142295794666
    1034 1993  1.3071327640228543  .03497218366314793    .0176851642616565    .003510669206185234    .002926902755808073
    1034 1994  1.1464456592573582  .38852132078770896    .2379927316033478   .0019328831149191166  .00005671050703919169
    1034 1995  1.3825792742571903 .015194540770329452  .022151952160832527     .00236866007786358   .0005064846923443151
    1034 1996  1.0879719338057765  .03813953781426567   .02699365049861937    .002701412767995111    .000445772485914062
    1034 1997   1.334662096077333  .04484135329397936   .07296134540362272      .0943451900614111 .000021193106697510787
    1034 1998  1.0996073430912625   .3007979539965752   .29080691159201477      .3169611911386275                      0
    1034 1999  1.0858141150391374   .5357912988373219    .4062189197039175     .08691591047114429                      0
    1034 2000  1.0550092366348223  .11031953017670085   .20729039720202092     .42199719719443646                      0
    1034 2001   .5108266164799449   .6110876874881631    .2223461362923682   .0034431690816456423                      0
    1034 2002  .43509023372127237 .012970667880609606   .04910736700462928   .0028117060696031143                      0
    1034 2003   .8002416639047117  .10341090954685483   .14129331227328376    .003941793459426607                      0
    1034 2004 -.09958190316402182 .013557048823922365   .06622853188211919                      0                      0
    1034 2005   .5536791995481042  .00979917578332024   .15561905579382007    .005668111557697663                      0
    1034 2006  1.1550824544696674                   0   .25816951391014936    .011533938694688807     .18935580821038533
    1034 2007   .8373982370270888  .31574599429057665                    0    .006336014770733327                      0
    1038 1992  1.1991569144244083   .5339013341311467    .5011530345592654   .0022372312343956428    .013238054641394338
    1038 1993   1.585173237896887  .08019203318880948   .18823743257186545     .25907640162308676                      0
    1038 1994  1.5572022813193043                   0  .004233196881558263   .0004767832491481738                      0
    1038 1995  1.6415318393738438  .22981725697782646    .4282127494953596   .0016814962635544302                      0
    1038 1996  1.2003629842400514   .4114897260982339   .02654832477691961  .00028958048062086055                      0
    1038 2003  1.1040559987361718  .19709083205849975    .1975540625515352   .0026104479593054356                      0
    1043 1998   .9024759525844125  .17300668151447662     .111358574610245                      0                      0
    1045 1994   .8530432105101097 .022042843837317604  .028407326917106488                      0                      0
    1045 1995   .9246267130292494 .009442676793595402    .0718977727599302                      0                      0
    1045 1996  1.2298726642923354                   0    .1089179791368378                      0                      0
    1045 1997  1.3505331102079845                   0   .03161438259257452    .009757525491535347    .036102844318680785
    1045 1998  1.4429448953055644 .024671288548888358  .026153478364809944    .005307195792493426     .04752569925890509
    1045 1999  1.1210388118486911  .09012240505761557   .01255436488364794   .0011209254360399945    .039053042191633414
    1045 2000   1.148441612940144  .03429884302945762   .03142693033560351   .0027488307212603594                      0
    1045 2006   .5813244124206554                   0   .04631293439566028     .02135955246651975                      0
    1045 2007   .7716355745336181                   0   .07963630125235889     .02014067593069137                      0
    1045 2015   .8991428276360632  .11443649905188366   .04918781841858765                      0     .08786639555870325
    1045 2016   .8832722237391271   .1590622740886089   .07904575028400289                      0     .09294640090880925
    1045 2017   .8345474355981011 .059640363537075324  .045481140539064636                      0     .03149744509888053
    1045 2018   .5317629580719709  .04580122966767842  .057222351934002644                      0    .016285314032220406
    1045 2019   .5899324943745312  .06536810828656323   .06916474083856058                      0    .018108286563222185
    1045 2022  .36742073057667357  .01865940307727987   .06549118834981672                      0   .0003665551586743474
    1050 1994    1.23934448900567 .014912389710451099   .06362619609792469                      0                      0
    1050 1996  1.1618084990761872                   0  .023134759976865243   .0031231925968768074                      0
    1050 1997   .7940333786978011  .10868383871318334  .016954678839256603                      0                      0
    1050 1998  1.7253505654281098 .016474464579901153   .03302055726667144                      0                      0
    1050 1999    .749769937148787  1.7994166479694862    .7587075928917609                      0                      0
    1050 2001  1.7601904582311898                   0   .15657649921282382     .07830613997423787                      0
    1050 2005   1.760717948717949                   0   .04498054832991873                      0                      0
    end

  • #2
    I cannot reproduce the specific error message you are getting: there is nothing wrong with 1995 in the code you show. However, you code still will not run, because `r(p66)' does not exist following -sum, detail-. Only the specific percentiles that are shown in the output of that command are saved in r(), and the 66th is not among them. You will need, instead, to use the -centile- command.
    Code:
    gen ltdii = 0
    gen ltdri = 0
    gen eii = 0
    gen eri = 0
    
    forvalues i = 1995/2022 {
        centile scaleddltis if fyear == `i', centile(66)
        replace ltdii = 1 if scaleddltis > `r(c_1)'  & fyear == `i'
        centile scaleddltis if fyear == `i', centile(66)
        replace ltdri= 1 if scaleddltis > `r(c_1)' & fyear == `i'
        centile scaleddltis if fyear == `i' , centile(66)
        replace eii= 1 if scaleddltis > `r(c_1)' & fyear == `i'
        centile scaleddltis if fyear == `i' , centile(66)
        replace eri= 1 if scaleddltis > `r(c_1)' & fyear == `i'
    }
    
    egen unfc = rowmax(ltdii ltdri eii eri)
    Note also that all of the -replace- commands need to be conditioned on fyear == `i' to get correct results. Your original code would have set UNFC = 1 for all observations if the gvkey met the stated criteria in any year.

    The final line in my code replaces your calculation of unfc. The results will be the same as with your original code, but I think this one-liner is more straightforward

    Comment


    • #3
      Your code won't work for various reasons.

      summarize does not calculate the 66th percentile and so that result is not saved in r(p66). This can be ascertained by consulting the help for summarize and/or by issuing return list after summarize to see what results are available.

      But it is not itself an error to refer to r(p66) which is evaluated as missing if not defined otherwise. That doesn't help here as no value is greater than missing unless you have extended missing values .a through .z and even if so that is nothing to do with the top tercile (tertile) (third).

      That is not what is biting first, which is incorrect punctuation around local macro references.

      That said, your code repeats the same calculation with the effect of creating five variables that are identical. As each of

      Code:
      ltdii ltdri eii eri
      is 0, 1 and produced by the same calculation, so also is unfc. Your code appears equivalent to

      Code:
      egen p66 = pctile(scaleddltis), p(66) by(year)
      
      foreach v in ltdii ltdri eii eri unfc {
          gen `v' = scaleddltis > p66
      }
      although I didn't test anything. From your text I think you want something more like

      Code:
      local j = 1
      foreach v in scaleddltis scaleddltr scaledsstk scaledprstkc {
          local V : word `j' of ltdii ltdri eii eri  
          egen p66 = pctile(`v'), p(66) by(fyear)
          gen `V' = `v' > p66 if `v' < .
          drop p66
          local ++j
      }
      
      gen unfc = inlist(`v', ltdii, ltdri, eii, eri)
      as you

      1. can work on different years in the same commmand by using the by() option allowed here with egen (undocumented, but here equivalent to the by: prefix)

      2. can loop over the variables to get the 66th percentiles and at the same time calculate new indicator variables (noting that the code would be simpler with a simpler correspondence of variable names)

      3. should not include missing values as above the 66th percentiles (presumably)

      4. can avoid the common but unnecessarily awkward two-step creation of indicators by initializing them as 0 and then replacing some values by 1.

      For more explanations see

      http://www.stata-journal.com/article...article=dm0026 (inlist)

      https://journals.sagepub.com/doi/pdf...6867X211063415 (loops in parallel)

      https://www.stata.com/support/faqs/d...rue-and-false/ or
      https://journals.sagepub.com/doi/pdf...867X1601600117 or
      https://journals.sagepub.com/doi/pdf...36867X19830921 (true or false expressions can be evaluated as (0, 1) indicator variables)
      Last edited by Nick Cox; 14 Jun 2023, 16:38.

      Comment


      • #4
        Any ultimate Statalist groupies can examine #2 and #3 for similarities and differences.

        Clyde missed what I think is a very minor but fatal punctuation error and the fact that you don't want what you coded, the same calculation using just one of your input variables repeatedly. centile is fine for individual percentiles, but egen, pctile() will do the looping over years for you.

        Clyde's rowmax() calculation and my inlist() calculation should be different routes to the same result.

        Comment


        • #5
          Clyde and Nick -

          Thank you very much for your two excellent solutions! I am so glad that I asked because I would have been very confused when my measure ended up looking nothing like the original paper. I appreciate the help

          Comment


          • #6
            Clyde missed what I think is a very minor but fatal punctuation error and the fact that you don't want what you coded, the same calculation using just one of your input variables repeatedly.
            Ouch, I did indeed miss that! Thanks for pointing that out, Nick.

            Comment


            • #7
              Often I miss something that Clyde spotted easily. We're not keeping scores.

              Meanwhile here is my own silly embarrassing error corrected.

              Code:
               
               gen unfc = inlist(1, ltdii, ltdri, eii, eri)
              Last edited by Nick Cox; 14 Jun 2023, 17:15.

              Comment

              Working...
              X