Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating peer group based on a score with an if conditions, and calculate the mean.

    Hi,

    I am currently working on a panel dataset (15 EU countries for 15 years) that comprises of approximately 7000 data. Based on the dataex provided below, I would like to form a peer group for each ISINyear (company id) variable based on the variable WM which contains a score that varies for each company identifier for every year. I want to form a peer group in a size of 4 for each ISINyear (company) that should satisfy these conditions:
    1. peer group in size of four, is formed based on four other company closest/nearest WM score with the target company.
    2. peer could not belong to the same country as the target firm, yet still within the same year of observation.
    Each variable that has been selected should be able to get picked again for another company. Should there are five companies with the same score, the peer could be chosen randomly. Lastly, after four companies are defined as the peers for the target firm, I would like to obtain the mean of the defined peer group PB variable, presumably in a newly generated variable.

    ----------------------- copy starting from the next line -----------------------
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str16 ISINyear str3 CurrentISOCountryCodeIncor float(WM PB) int year
    "FR00000525162006" "FRA"   90.26172 3.0738685 2006
    "FR00000525162007" "FRA"   90.26172  3.502112 2007
    "DE00070740072008" "DEU"   88.95918  8.301935 2008
    "GRS2841830012008" "GRC"   88.95918  .4692065 2008
    "FR00000525162008" "FRA"   88.95918  2.175335 2008
    "FR00000525162009" "FRA"   95.29973  2.449753 2009
    "DE00070740072009" "DEU"   95.29973 10.180838 2009
    "GRS2841830012009" "GRC"   95.29973  .4450149 2009
    "GRS2841830012010" "GRC"  104.36723 .55562204 2010
    "FR00000525162010" "FRA"  104.36723 2.1385944 2010
    "DE000A0M6M792010" "DEU"  104.36723 1.9217272 2010
    "DE00070740072010" "DEU"  104.36723  10.33614 2010
    "DE00070740072011" "DEU"  100.76558   11.0161 2011
    "FR00000525162011" "FRA"  100.76558 1.9883522 2011
    "GRS2841830012011" "GRC"  100.76558 .29267552 2011
    "DE00070740072012" "DEU"   52.92004 15.959933 2012
    "FR00000525162012" "FRA"   52.92004 2.3490067 2012
    "GRS2841830012012" "GRC"   52.92004  .4214217 2012
    "FR00000525162013" "FRA"   61.94163 2.2705362 2013
    "DE00070740072013" "DEU"   61.94163  13.59248 2013
    "GRS2841830012013" "GRC"   61.94163  .4789082 2013
    "FR00000525162014" "FRA"   56.32935  1.820148 2014
    "DE00070740072014" "DEU"   56.32935  14.16732 2014
    "GRS2841830012014" "GRC"   56.32935  .6914167 2014
    "DE00070740072015" "DEU"   47.15211 12.891316 2015
    "GRS2841830012015" "GRC"   47.15211 2.0456204 2015
    "FR00000525162015" "FRA"   47.15211    1.2696 2015
    "GRS2841830012016" "GRC"          .  3.483192 2016
    "FR00000525162016" "FRA"          . 1.2957324 2016
    "DE00070740072016" "DEU"          . 12.624042 2016
    "GRS2010030192002" "GRC"          .  .1711396 2002
    "GRS2010030192003" "GRC"          . .21721806 2003
    "GRS2010030192004" "GRC"          .  .3559328 2004
    "FR00001317572003" "FRA"  30.296736  1.428482 2003
    "FR00001317572005" "FRA"  33.633835  1.895018 2005
    "FR00001317572008" "FRA"  36.465347 1.5608405 2008
    "AT00000ATEC92008" "AUT"  36.465347  .5215496 2008
    "AT00000ATEC92009" "AUT"  36.465347  .6974818 2009
    "GRS3935030082009" "GRC"  36.465347  .8028448 2009
    "FR00001317572009" "FRA"  36.465347 2.7713785 2009
    "FR00001317572010" "FRA"   31.74601 2.3749483 2010
    "GRS3935030082010" "GRC"   31.74601  .8398629 2010
    "FR00001317572011" "FRA"  20.858995  .9285181 2011
    "GRS3935030082011" "GRC"  20.858995   .399087 2011
    "FR00001317572012" "FRA"  11.381454  .7312967 2012
    "GRS3935030082012" "GRC"  11.381454  .5881699 2012
    "FR00001317572013" "FRA" -2.7084286  .9373043 2013
    "GRS3935030082013" "GRC" -2.7084286  .9195927 2013
    "GRS3935030082014" "GRC"  -6.021147  .6953707 2014
    "FR00001317572014" "FRA"  -6.021147  .9602507 2014
    "FR00001317572015" "FRA"          . .43921465 2015
    "GRS3935030082015" "GRC"          .  .4231185 2015
    "FR00001317572016" "FRA"          .  .7215712 2016
    "FR00103977602016" "FRA"          . 1.5449473 2016
    "GRS3935030082016" "GRC"          .  .8319205 2016
    "FR00131818642002" "FRA"   .6516934 .31987685 2002
    "FR00131818642003" "FRA"   .6516934  .9698271 2003
    "FR00131818642005" "FRA" -1.4012986  2.428733 2005
    "IT00052521402005" "ITA" -1.4012986 4.1117234 2005
    "AT00007430592006" "AUT" -1.4012986 2.0502448 2006
    "IT00052521402006" "ITA" -1.4012986 4.7146993 2006
    "FR00131818642006" "FRA" -1.4012986  2.605061 2006
    "AT00007430592007" "AUT"   20.08488  1.558389 2007
    "IT00052521402007" "ITA"   20.08488   3.83814 2007
    "FR00131818642007" "FRA"   20.08488 1.5033473 2007
    "AT00007430592013" "AUT"   3.691945  .9262624 2013
    "IT00052521402013" "ITA"   3.691945 1.3275653 2013
    "AT00007430592014" "AUT"   3.691945  .7292546 2014
    "IT00052521402014" "ITA"   3.691945   .797046 2014
    "FR00103771272014" "FRA"   3.691945  .9565661 2014
    "IT00052521402015" "ITA"   70.40891 .27470592 2015
    "FR00103771272015" "FRA"   70.40891  .9916122 2015
    "AT00007430592015" "AUT"   70.40891    .69217 2015
    "FR00103771272016" "FRA"          . 1.5045315 2016
    "AT00007430592016" "AUT"          . 1.1086997 2016
    "IT00052521402016" "ITA"          . .09065073 2016
    "GRS3720030042013" "GRC"          .   1.63674 2013
    "GRS1450030002004" "GRC" -158.87476  .8783258 2004
    "DE00060700062006" "DEU"  -167.8827  2.807499 2006
    "GRS1450030002006" "GRC"  -167.8827  2.483208 2006
    "DE00060700062007" "DEU"  -167.8827 1.6575663 2007
    "FI00098006432007" "FIN"  -167.8827 3.0247865 2007
    "GRS1450030002007" "GRC"  -167.8827 1.3069142 2007
    "FI00098006432008" "FIN"  -172.6998  .8926127 2008
    "GRS1450030002008" "GRC"  -172.6998  .6546887 2008
    "DE00060700062008" "DEU"  -172.6998 1.0139452 2008
    "NL00092691092008" "NLD"  -172.6998 .11335886 2008
    "GRS1450030002009" "GRC" -177.02313  .8130502 2009
    "NL00092691092009" "NLD" -177.02313   .631114 2009
    "FI00098006432009" "FIN" -177.02313  2.818069 2009
    "DE00060700062009" "DEU" -177.02313 1.9107054 2009
    "FI00098006432010" "FIN"  -177.3542  3.105407 2010
    "GRS1450030002010" "GRC"  -177.3542  .4651518 2010
    "NL00092691092010" "NLD"  -177.3542  .8515041 2010
    "DE00060700062010" "DEU"  -177.3542  1.895007 2010
    "DE00060700062011" "DEU"  -182.0614 1.3275746 2011
    "NL00092691092011" "NLD"  -182.0614  .3881037 2011
    "GRS1450030002011" "GRC"  -182.0614   .179876 2011
    "FI00098006432011" "FIN"  -182.0614 2.2997575 2011
    "FI00098006432012" "FIN" -196.83437 2.0626721 2012
    end
    ------------------ copy up to and including the previous line ------------------

    Listed 100 out of 7318 observations
    Use the count() option to list more

    To be honest, I am entirely new to Stata, and I have no clue on how to solve this type of situation. Therefore, your suggestions and inputs are highly appreciated
    Thank you in advance.

    Best,

    Kevin

  • #2
    There's one thing I don't understand. The same ISINyear values can appear in different observations with different years. Do you want a separate peer group for each year that ISINyear appears? Or do you want a single peer group that is the same for all the years for which that ISINyear appears--and if so, which year's values of WM should it be based on? (Or does it not matter which year they are based on: within your example data, WM is the same for all years for any given ISINyear. Is that true in the whole data set?)

    Comment


    • #3
      No, ISINyear is already a unique identifier that would never appear twice in the same year, it is a variable that combines company id and the year variable. I want to create a separate peer group for each ISINyear that appears. For example, The peer group of ISINyear "FR00001317572008" should be selected based on the closest value of WM of that particular ISINyear (company) and year. Therefore, ISINyear "AT00000ATEC92008" (dataex 1 row below "FR00001317572008") would be classified as a peer because it has the same WM value and it also belongs to the same year that is 2008. In this case, I would need four peers (excluding the target company) for each ISINyear or which we can also call it a unique identifier.
      To put another example, ISINyear "AT00000ATEC92009" also has a similar WM value with the ISINyear mentioned above, however since this company does not belong to the same year, it can't be classified as a peer. I believe that the value of WM varies within the whole data set. Out of this defined peers, the mean of the selected peers PB variable is calculated.
      I hope that this has answered your question. I can't thank you more for your help.


      Best,

      Kevin.

      Comment


      • #4
        Thanks for the clarification. I think this does what you want:

        Code:
        //    VERIFY THAT ISINyear UNIQUELY IDENTIFIES OBSERVATIONS
        isid ISINyear
        
        //    VERIFY THAT year EQUALS THE FINAL FOUR DIGITS OF ISINyear
        assert year == real(substr(ISINyear, -4, 4))
        
        //    CREATE A TEMPORARY FILE OF POTENTIAL PEERS
        preserve
        ds year, not
        rename (`r(varlist)') peer_=
        tempfile peers
        save `peers'
        
        //    MATCH EACH OBSERVATION TO ITS POTENTIAL PEERS FROM SAME YEAR
        restore
        joinby year using `peers'
        
        //    EXCLUDE PEERS FROM SAME COUNTRY
        drop if CurrentISOCountryCodeIncor == peer_CurrentISOCountryCodeIncor
        
        //    FIND THOSE WITH CLOSEST VALUES OF WM
        //    BREAKING TIES RANDOMLY
        gen delta = abs(WM-peer_WM)
        set seed 1234
        gen double shuffle = runiform()
        by ISINyear (delta shuffle), sort: keep if _n <= 4

        Comment


        • #5
          Thank you for your help. I tried implementing the command, and I believe that the output is almost according to how I want it to be.
          I forgot to mention that the peer can't be the company itself. For example, "FR00001317572008" as the target company should find four other peers that does not include the target company itself.
          Also, after deriving sets of peers, how can I calculate the mean of variable PB of the peers for each target firm? Using the same example as above using "FR00001317572008" as the target firm, "AT00000ATEC92008", "xxxxx2008", "xxxx2008", "xxx2008" are the peers of ISINyear "FR00001317572008". How can I calculate the PB mean of these peers so that the value of the mean can be assigned to the target company "FR00001317572008"?.

          I do have some more questions regarding some of the command. Is it possible for me to add more condition for the peer selection just like the condition that I put on the country mentioned earlier?. For example, each observation is entitled to a variable called ACCSTD (Accounting Standard) which shows the current accounting standard that a company is using at a particular year. The option is only 2, either local accounting standard or IFRS. Holding everything the same, Is it possible for me to form a peer group that does not only based on a country but also based on this additional condition? Thank you.

          Comment


          • #6
            I forgot to mention that the peer can't be the company itself.
            Well, since the peer can't come from the same country, doesn't that automatically exclude the company itself? Or can a company be in more than one country? If so, you can just add -drop if ISINyear == peer_ISINyear- to the code, right after or right before -drop if CurrentISOCountryCodeIncor == peer_CurrentISOCountryCodeIncor-.

            As for calculating the mean PB of each peer group:
            Code:
            by ISINyear, sort: egen mean_PB = mean(peer_PB)
            As for additional conditions, we can distinguish two situations. One is where the condition is that the peers must be the same as the original ISINyear on that variable. In that case, the easiest way to do it is to add that variable to the -joinby year using `peers'- command immediately before or after year.

            If the additional condition is one where the value for the peer must be different from that of the original ISINyear, you handle that the way country is handed. Just add another -drop if vble_must_be_different == peer_vble_must_be_different- command after the one that we already have for country.

            And if the additional condition is more complicated, it's just a matter of putting appropriate -keep if- or -drop if- commands in the code after the -joinby- and before -gen delta = ...-

            Comment


            • #7
              Thank you for your response. I managed to implement all the command and got the result that I need. Another question regarding peers formation. Is there a STATA command that enables us to solely observe and calculate on target companies that have existed from the beginning period of the observation year (2002) to the very end of the observation year (2016) (or at least one observation before 2005 and one observation after 2005)?. To make this more clear, I want to present a paired comparison from the pooled sample data set. For example, company A that has existed since 2002 would have undergone an accounting change in 2005. Therefore, I would like to observe the change in company A's mean and peer selection. As some target companies that have just emerged in 2009 would still be included within the pooled dataset sample, this type of companies does not necessarily answer the question whether the accounting change in 2005 has some effect on the peer selection for companies that have existed before. I hope that I have explained the situation clearly. Again, your response is highly appreciated.

              Kevin

              Comment


              • #8
                Well, in order to do this, it must be possible to identify the same company occurring in different years. In your example data, I observe that the year is identical to the last four characters of ISINyear. Assuming that this is true in your full data, and assuming that ISINyear with the year removed is, in fact, a unique identifier for companies, then you can identify those firms that have data both preceding and following 2005 as follows:

                Code:
                assert year == real(substr(ISINyear, -4, 4))
                gen ISIN = substr(ISINyear, 1, length(ISINyear)-4)
                
                by ISIN, sort: egen in_before_2005 = min(year < 2005)
                by ISIN: egen in_after_2005 = min(year > 2005)
                gen byte to_include = in_before_2005 & in_after_2005
                Note: The first two lines just verify that in the entire data year is, in fact, the same as the last four characters of ISINyear, and then extract the company identifier (now called ISIN) from ISINyear.

                Comment


                • #9
                  The first two commands work perfectly, and it is indeed a unique company identifier. However, the fifth command failed to identify a company that exists before 2005 and after 2005 as seen in the to_include variable which contains only 0 value. Below I attach the result that I got using dataex.
                  From how I understand it, the to_include variable should appear as a number 1 when a company both occur in before and after 2005?
                  I realized that in the example that I provided there is no ISIN(company) that has already occurred before and after 2005, but as I have checked it manually, there are some companies that meet this requirement, and I would like to group and analyze them separately. Thank you.
                  ----------------------- copy starting from the next line -----------------------
                  Code:
                  * Example generated by -dataex-. To install: ssc install dataex
                  clear
                  input str12 ISIN float(in_before_2005 in_after_2005) int year byte to_include
                  "ANN4327C1220" 1 0 2003 0
                  "ANN4327C1220" 1 0 2003 0
                  "ANN4327C1220" 1 0 2004 0
                  "ANN4327C1220" 1 0 2004 0
                  "ANN4327C1220" 1 0 2004 0
                  "ANN4327C1220" 1 0 2004 0
                  "ANN4327C1220" 1 0 2005 0
                  "ANN4327C1220" 1 0 2005 0
                  "ANN4327C1220" 1 0 2005 0
                  "ANN4327C1220" 1 0 2005 0
                  "AT000000STR1" 0 1 2008 0
                  "AT000000STR1" 0 1 2008 0
                  "AT000000STR1" 0 1 2008 0
                  "AT000000STR1" 0 1 2008 0
                  "AT000000STR1" 0 1 2009 0
                  "AT000000STR1" 0 1 2009 0
                  "AT000000STR1" 0 1 2009 0
                  "AT000000STR1" 0 1 2009 0
                  "AT000000STR1" 0 1 2010 0
                  "AT000000STR1" 0 1 2010 0
                  "AT000000STR1" 0 1 2010 0
                  "AT000000STR1" 0 1 2010 0
                  "AT000000STR1" 0 1 2013 0
                  "AT000000STR1" 0 1 2013 0
                  "AT000000STR1" 0 1 2013 0
                  "AT000000STR1" 0 1 2013 0
                  "AT000000STR1" 0 1 2014 0
                  "AT000000STR1" 0 1 2014 0
                  "AT000000STR1" 0 1 2014 0
                  "AT000000STR1" 0 1 2014 0
                  "AT000000STR1" 0 1 2015 0
                  "AT000000STR1" 0 1 2015 0
                  "AT000000STR1" 0 1 2015 0
                  "AT000000STR1" 0 1 2015 0
                  "AT000000STR1" 0 1 2016 0
                  "AT000000STR1" 0 1 2016 0
                  "AT000000STR1" 0 1 2016 0
                  "AT000000STR1" 0 1 2016 0
                  "AT00000AMAG3" 0 1 2011 0
                  "AT00000AMAG3" 0 1 2011 0
                  "AT00000AMAG3" 0 1 2011 0
                  "AT00000AMAG3" 0 1 2011 0
                  "AT00000AMAG3" 0 1 2012 0
                  "AT00000AMAG3" 0 1 2012 0
                  "AT00000AMAG3" 0 1 2012 0
                  "AT00000AMAG3" 0 1 2012 0
                  "AT00000AMAG3" 0 1 2013 0
                  "AT00000AMAG3" 0 1 2013 0
                  "AT00000AMAG3" 0 1 2013 0
                  "AT00000AMAG3" 0 1 2013 0
                  "AT00000AMAG3" 0 1 2014 0
                  "AT00000AMAG3" 0 1 2014 0
                  "AT00000AMAG3" 0 1 2014 0
                  "AT00000AMAG3" 0 1 2014 0
                  "AT00000AMAG3" 0 1 2015 0
                  "AT00000AMAG3" 0 1 2015 0
                  "AT00000AMAG3" 0 1 2015 0
                  "AT00000AMAG3" 0 1 2015 0
                  "AT00000AMAG3" 0 1 2016 0
                  "AT00000AMAG3" 0 1 2016 0
                  "AT00000AMAG3" 0 1 2016 0
                  "AT00000AMAG3" 0 1 2016 0
                  "AT00000ATEC9" 0 1 2008 0
                  "AT00000ATEC9" 0 1 2008 0
                  "AT00000ATEC9" 0 1 2008 0
                  "AT00000ATEC9" 0 1 2008 0
                  "AT00000ATEC9" 0 1 2009 0
                  "AT00000ATEC9" 0 1 2009 0
                  "AT00000ATEC9" 0 1 2009 0
                  "AT00000ATEC9" 0 1 2009 0
                  "AT00000FACC2" 0 1 2014 0
                  "AT00000FACC2" 0 1 2014 0
                  "AT00000FACC2" 0 1 2014 0
                  "AT00000FACC2" 0 1 2014 0
                  "AT00000FACC2" 0 1 2015 0
                  "AT00000FACC2" 0 1 2015 0
                  "AT00000FACC2" 0 1 2015 0
                  "AT00000FACC2" 0 1 2015 0
                  "AT00000OESD0" 0 1 2013 0
                  "AT00000OESD0" 0 1 2013 0
                  "AT00000OESD0" 0 1 2013 0
                  "AT00000OESD0" 0 1 2013 0
                  "AT00000OESD0" 0 1 2014 0
                  "AT00000OESD0" 0 1 2014 0
                  "AT00000OESD0" 0 1 2014 0
                  "AT00000OESD0" 0 1 2014 0
                  "AT00000OESD0" 0 1 2015 0
                  "AT00000OESD0" 0 1 2015 0
                  "AT00000OESD0" 0 1 2015 0
                  "AT00000OESD0" 0 1 2015 0
                  "AT0000644505" 0 1 2006 0
                  "AT0000644505" 0 1 2006 0
                  "AT0000644505" 0 1 2006 0
                  "AT0000644505" 0 1 2006 0
                  "AT0000644505" 0 1 2007 0
                  "AT0000644505" 0 1 2007 0
                  "AT0000644505" 0 1 2007 0
                  "AT0000644505" 0 1 2007 0
                  "AT0000644505" 0 1 2008 0
                  "AT0000644505" 0 1 2008 0
                  end
                  ------------------ copy up to and including the previous line ------------------

                  Comment


                  • #10
                    Sorry, my error. Both instances of -min- should be -max-:

                    Code:
                    by ISIN, sort: egen in_before_2005 = max(year < 2005)
                    by ISIN: egen in_after_2005 = max(year > 2005)
                    gen byte to_include = in_before_2005 & in_after_2005
                    Note, by the way, that year 2005 itself counts as neither before nor after in this code. If you want to include it in one of the two groupings (I don't know which would be appropriate for your purposes) change the strict inequality to non-strict inequality in the corresponding command.

                    Comment


                    • #11
                      Perfect! The command now works accordingly. There is one last thing that I want to ask related to this scenario. How can I compare the mean of a variable (the name of the variable: "AVEmeanyearly") based on this distinction? I want to analyze whether the yearly mean of AVE has increased or decreased after the accounting change in 2005, but only for the company/ observations that has a value of 1 in the to_include variable (occurred in both before 2005 and after 2005). Once again, thank you in advance.

                      Comment


                      • #12
                        Something like this (not tested):
                        Code:
                        xtset ISIN
                        gen byte pre_2005 = (year < 2005) if !missing(year)
                        xtreg AVEmeanyearly i.pre_2005 if to_include, fe
                        Note that this is a "bare bones" analysis. You may need to add other variables to the model to adjust for their effects. You may need cluster robust standard errors. These considerations depend on substantive, not statistical, aspects of the problem that I have no familiarity with or expertise in. Note also that this contrasts years up through 2004 with 2005 and beyond. If you want to consider 2005 to be one of the pre-policy change years, change < to <=.

                        Comment


                        • #13
                          2005 is included in the after_2005; therefore I put >= on the previous equation you provided me.
                          Regarding the command that you just provided, it seems that ISIN is a string variable that is not allowed in the xtset command; thus i can't run the command. Further, although I haven't looked at the result of the outcome, I am not sure if the bare bones analysis is what I asked. For clarity, I am going to elaborate more on the question,

                          In my data, there is already a variable called AVE for each company from 2002 to 2016. I want to observe the yearly AVE mean of these companies/observations that existed in both before 2005 and after 2005 in another word, fits with the var to_include requirement. I thought that this kind of analysis could be done without having to run another regression (xtreg), instead, egen command with a slight extension I thought would work "egen AVEmeanyearly=mean(AVE), by(year)" extended to --> egen AVEbalance=mean(AVE), by(to_include year). After I tried it, I am not sure if the result has provided me with the outcome that I want, because it seems to also include the value 0 in the to_include variable that is not supposed to be included for this analysis.
                          Again please correct me if I am wrong.

                          Comment


                          • #14
                            Yes, you are right about ISIN. It is a string variable and cannot be used in -xtset-. My apologies for the error. The simple correction is:
                            Code:
                            encode ISIN, gen(n_ISIN)
                            xtset n_ISIN
                            If you run
                            Code:
                            egen AVEbalance=mean(AVE), by(to_include year)
                            you will get a new variable calculated separately for each year, and within each year there will be one value averaging the companies that were present both before and after 2005, and there will be another value averaging the companies that were not. It's not what I understood by "compare" the values based on the pre- and post-2005 distinction, and certainly not the ordinary meaning of "but only for the company/ observations that has a value of 1 in the to_include variable (occurred in both before 2005 and after 2005)." But if that's what you want, go ahead and do it!

                            Comment

                            Working...
                            X