Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to find similar entries to build a peer-group?

    Dear fellow Stata users,

    currently, I am working with some datasets containing company information. I have to find similar companies with respect to 4 variables.
    The situation is as follows:

    13 companies from Australia have been targeted from Sovereign Wealth Funds.
    No I have to find comparable companies from Australia (approx 4000 entries) with respect to the same industry, ROA, Size and MTBV. The later three vars measured in the same year the targeted company got targetet.

    I do have two panels, one containing the 13 companies and the variables for the relevant years (1997-2014); and one containing the relevant vars for the same years for the other 4000 companies from Australia.

    Panel looks as follows: (I know I have to add the "targeted in" variable somehow to get the relevant year)
    Company Year Var1 ...
    1 1997 x ...
    1 1998 y ...
    1
    ...
    1999
    ...
    z
    ...
    ...
    ...
    2 1997 ab ...
    2
    ....
    1998
    .....
    ac
    ...
    ...
    ...


    Now I have to find a peer group of 13 companies. Each targetet company needs a peer company from the same industry and the closest possible match for the 3 variables in the relevant year.

    E.g. Company 1 got targeted in 2000 --> need to find the company from the same industry with the closest values for the vars in year 1999.


    Hope you can understand my troubles.
    Any help is appreciated!!

    Greetings
    Philipp



  • #2
    Your problem is ill-posed and it isn't clear that the information needed to solve it, once clarified, is available.

    First, you tell us that company 1 got targeted in 2000, but there is nothing in the example data you show that would enable anyone (or, more important, Stata) to know that. Your post indicates that you are aware of that problem, but until it is solved, nothing more can be said.

    Next, assuming that you have a resolution to the other problem, there is no such thing as the closest possible match for 3 variables. What do you do when if one observation gives a really close match on var1 but not very close on var2, and a different observation gives a really close match on var2 but not very close on var1? The point is you need a specific metric defined in terms of the three variables that defines what is meant by closest possible match.

    I suggest you post back with an improved example. Also do NOT post it as an HTML table. They can be difficult to import into Stata, and even those that import easily are lacking information about data storage types, display formats, and labels--information which is sometimes crucial for getting things right. The useful way to show a Stata data example here is with the -dataex- command, by Robert Picard. Run -ssc install dataex- to install it. The simple instructions can be seen by running -help dataex-. Using -dataex- assures you that those who want to help you will be able to create a complete and faithful replica of your Stata example with a simple copy/paste operation. Use -dataex- whenever you post example data here. In your post-back, also show the formula for closeness you want to use, or clearly describe in words an algorithm for figuring out what is closest.

    Comment


    • #3
      Hello and thanks for the fast response!

      Well it seems like I have to solve some probplems first.. If you don't mind I will come back to this post again later after I've done more research.
      I've heard something about Propensity Score Matching. Is this a way to get close observations?

      And thanks for the explanation of how to post propper tables here!

      Comment


      • #4
        I've heard something about Propensity Score Matching. Is this a way to get close observations?
        Yes, it is one way to get matched pairs. However, it is not aimed at getting "closeness" on the particular 3 variables you are interested in; it is designed to match up pairs that are close in their overall probability of being a case (as opposed to being a control). So propensity score matching will sometimes give you matched pairs that do not agree particularly closely on any variables at all, but when you combine all the variables in the way that predicts "caseness," they end up very close on that. This may or may not be suitable for your purposes.

        Comment


        • #5
          Hello Mr. Schechter,
          well that sounds like exactly what I need. I try to explain my current goal with the mentioned data.

          I have a set of approx. 500 listed public companies from different countries that got targeted by an institutional investor (Sovereign Wealth Fund).
          Now, I want to analyse the difference between targeted and non-targeted companies or in other words - "What are the determinants for SWFs to invest in a company".

          Therefore, I need a group of non-targeted companies as control group. Each target (case = 1) gets one matched “non-targeted company” (case = 0) - resulting in the peer group I am currently searching.

          If you say a propensity score is designed to match up pairs that are close in their overall probability of being a case, this sounds like a good way to go.
          Like – “the probability of getting targeted based on these 3 variables is similar for the targets and the resulting non-targeted control group”

          Now a question: Is it possible to perform a propensity score match based on the 3 vars to identify the peer group? Meaning I can see the company names of each matched pair?
          (E.g.: company abc got matched with firm xyz)

          Next step is to download several more financial fundamentals for the peer group and the targeted group. Researching this for the entire universe of companies per country would be crazy. A set of 500 control companies is much easier to handle. For descriptive statistics etc., as well

          Hope I was able to describe the situation.
          Thanks a lot for your time and excuse potentially obvious questions!!
          Philipp

          Comment


          • #6
            So, yes, it sounds like propensity score matching is suitable for you. Depending on the details of the model you want to ultimately fit, you are likely to find a "bundle" of propensity score matching and your analysis among the various treatment effects commands in Stata. See -help treatment effects- and then link from there to the appropriate specifics. If the analysis you need isn't available there, you may have to separately do the propensity score matching and then run the analysis. For that, the user-written command -psmatch2- is available from SSC.

            Comment


            • #7
              Hello all together, it's me again regarding the psmatch2.
              Thanks for reading and helping in advance!!

              The attached table at the bottom shows an excerpt of the data as follows.
              9 targeted companies - Targeted in the given Year
              5 not targeted companies with the values for 5 years each (Panel)

              Comany Identifier - targeted (=1) or not targeted (=0) - Year of the observation - two digit SIC - log(firmsize) - Market to Book Value - ROA - change in ROA [t-3; t-1]

              Question: I have to performe a 1:1 match (noreplacement). Each Target has to get one not targeted peer. the match musst be based on the same Year and TwoDigitSIC

              Do I have to performe this match year by year and SIC by SIC?
              There musst be a more productive way! - Please Help :D



              so far I tried --> psmatch2 Target Size MTBV ROA DeltaROA if Year == Year

              the "if Year == Year" is not working. Matches appeared to be from different years as well.
              max. pscore is 5.8% and most of the pscores are below 1% - Problem?


              Moreover, the summary statistics of the code above seems to be kind of bad.
              Probit regression Number of obs = 8,881
              LR chi2(4) = 22.49
              Prob > chi2 = 0.0002
              Log likelihood = -52.85066 Pseudo R2 = 0.1754
              Target Coef. Std. Err. z P>z
              Size .1905188 .0424675 4.49 0.000
              MTBV -.0005032 .0021 -0.24 0.811 Values are way to large?
              ROA .0000245 .0004463 0.05 0.956
              DeltaROA 1.08e-06 .0000691 0.02 0.988
              _cons -5.47112 .6109922 -8.95 0.000


              Code:
              * Example generated by -dataex-. To install: ssc install dataex
              clear
              input str6 dscd float Target int Year float(TwoDigitSIC Size MTBV ROA DeltaROA)
              "912887" 1 2013 13 14.575327  .96   7.35   13.5
              "890829" 1 2013 13 13.866814  .73   2.25  11.65
              "88987Q" 1 2013 80 12.754365 2.62    6.7    -.8
              "675302" 1 2013 51 14.892998 1.62    5.9  -2.52
              "357851" 1 2011 67  15.05567  .88   3.95   28.8
              "32531U" 1 2013 20 14.729865 1.29      .      .
              "268432" 1 2011 33  15.98333  .47   4.09  -1.66
              "256024" 1 2011 33 15.917235  .43 -11.86 -12.65
              "131158" 1 2011 10  14.68429 1.41  -1.49  25.98
              "999673" 0 2011 13 12.127484 1.25  -2.68  15.62
              "999673" 0 2012 13 12.778503 1.82   3.54  33.45
              "999673" 0 2014 13 13.182096 1.71    7.3   9.98
              "999673" 0 2010 13   11.0375  .87   4.97  53.59
              "999673" 0 2013 13 13.030254 1.84  14.77    9.8
              "999616" 0 2014 37         .  1.8      .      .
              "999616" 0 2010 37         .  1.8      .      .
              "999616" 0 2013 37         .  1.8      .      .
              "999616" 0 2012 37         .  1.8      .      .
              "999616" 0 2011 37         .  1.8      .      .
              "999519" 0 2010 73  9.769098 1.52    4.6  71.37
              "999519" 0 2013 73  9.054855 2.48  -7.71 -12.31
              "999519" 0 2014 73  9.468542 3.61  25.62  57.82
              "999519" 0 2012 73   9.16115 1.76 -68.14  -74.2
              "999519" 0 2011 73  9.625888 1.95  -32.2  -33.7
              "998356" 0 2012 13 12.734632 1.89   3.05   8.27
              "998356" 0 2013 13 13.051398 1.84   1.45 -32.37
              "998356" 0 2014 13  13.14725 1.27   4.27 -11.67
              "998356" 0 2011 13  12.54139 1.75  15.94  37.37
              "998356" 0 2010 13 12.027555 2.26  33.82  38.27
              "998066" 0 2010 62 14.287406 3.57   4.46   -6.4
              "998066" 0 2012 62 14.039056 3.59   2.01    .29
              "998066" 0 2013 62 13.699353 3.04   5.67   1.21
              "998066" 0 2014 62 13.831655 3.94   8.14   4.45
              "998066" 0 2011 62 14.342036 4.07   3.69  -2.17
              end
              ------------------ copy up to and including the previous line ------------------

              Listed 34 out of 15544 observations





              Comment

              Working...
              X