Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Matching firms

    Dear all,

    I am currently writing a code to capture the differences in earnings management between US firms and cross-listed firms (foreign firms on an American stock exchange). Because the cross-listed firms are self-selected, the data might be biased. Therefore, I have to match the cross-listed firms with US firms based on:

    - fyear (fiscal year)
    - sic_2 (industry)
    - at (total assets)

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long gvkey double fyear byte sic_2 double at
    1004 1996 50  529.584
    1004 1997 50  670.559
    1004 1998 50   726.63
    1004 1999 50  740.998
    1004 2000 50  701.854
    1004 2001 50  710.199
    1004 2002 50  686.621
    1004 2003 50  709.292
    1004 2004 50   732.23
    1004 2005 50  978.819
    1004 2006 50 1067.633
    1004 2007 50  1362.01
    1004 2008 50 1377.511
    1004 2009 50 1501.042
    1004 2010 50 1703.727
    1004 2011 50 2195.653
    1004 2012 50   2136.9
    1004 2013 50   2199.5
    1004 2014 50     1515
    1004 2015 50   1442.1
    1010 1997 37   3181.3
    1010 1998 37   3257.3
    1010 1999 37   3563.4
    1010 2000 37   3794.5
    1010 2001 37   3723.1
    1010 2002 37   3702.5
    1010 2003 37   4832.1
    1013 1997 36  936.303
    1013 1998 36 1300.587
    1013 1999 36 1672.529
    1013 2000 36   3970.5
    1013 2001 36   2499.7
    1013 2002 36   1144.2
    1013 2003 36   1296.9
    1013 2004 36   1428.1
    1013 2005 36     1535
    1013 2006 36   1611.4
    1013 2007 36   1764.8
    1013 2008 36     1921
    1013 2009 36   1343.6
    1013 2010 36   1474.5
    1019 1997 73    26.71
    1019 1998 73   29.283
    1019 1999 73   29.341
    1019 2000 73   28.638
    1019 2001 73   30.836
    1021 1997 38   20.516
    1021 1998 38   18.661
    1021 1999 38   13.986
    1021 2000 38   11.608
    1021 2001 38    8.635
    1021 2002 38     7.85
    1021 2003 38    6.044
    1021 2004 38    6.245
    1021 2005 38    8.153
    1021 2006 38   14.341
    1021 2007 38   27.171
    1021 2008 38   21.401
    1034 1997 28  631.866
    1034 1998 28  908.936
    1034 1999 28 1160.266
    1034 2000 28 1610.435
    1034 2001 28 2390.008
    1034 2002 28 2296.924
    1034 2003 28 2329.268
    1034 2004 28 2003.842
    1034 2005 28 1623.383
    1034 2006 28  927.239
    1034 2007 28 1288.165
    1036 1997 34 1778.547
    1036 1998 34  2113.32
    1036 1999 34 2241.575
    1036 2000 34 2325.377
    1037 1996 36    4.969
    1037 1997 36     5.45
    1037 1998 36    3.228
    1037 1999 36    4.575
    1037 2000 36    6.373
    1037 2001 36   17.867
    1038 1996 78  718.213
    1038 1997 78   795.78
    1038 1998 78   975.73
    1038 1999 78 1188.805
    1038 2000 78 1047.264
    1038 2001 78  1279.17
    1038 2002 78 1491.698
    1038 2003 78 1506.534
    1043 1997 50     44.9
    1043 1998 50   45.639
    1043 1999 50    42.21
    1045 1997 45    20915
    1045 1998 45    22303
    1045 1999 45    24374
    1045 2000 45    26213
    1045 2001 45    32841
    1045 2002 45    30267
    1045 2003 45    29330
    1045 2004 45    28773
    1045 2005 45    29495
    1045 2006 45    29145
    end
    I currently have 5,208 cross-listed firms, and I want to reduce the amount of American firms (16,663) to the same amount (total amount of observations is 186,587).

    Perhaps what I mean is not exactly called 'matching'. Anyway, I am looking to keep one American firm for each cross-listed firm, that is in the same year and industry and with the closest asset size.

    If I use -joinby- I get exactly the American firms that I need, however in that case they are in the same observation line as the cross-listed firms. If you can help me in 'splitting up' these observations, that would be equally helpful. The code I used for that is:

    Code:
    preserve
    keep if dummy_foreign == 0
    ds sic_2 fyear, not
    rename (`r(varlist)') for0_=
    tempfile for0
    save `for0'
    
    restore
    keep if dummy_foreign == 1
    ds sic_2 fyear, not
    rename (`r(varlist)') for1_=
    tempfile for1
    save `for1'
    
    //    FIND ALL ADMISSIBLE PAIRINGS
    joinby sic_2 fyear using `for0'
    
    //    KEEP ONE WITH CLOSEST SIZE
    gen size_diff = abs(for0_at - for1_at)
    by for1_gvkey fyear sic_2 (size_diff), sort: keep if _n == 1
    Thank you in advance.

  • #2
    Well, thank you for showing some example data and code, but the code you show is not usable with the example data because the variable dummy_foreign is missing.

    Basically what you will end up with after your code is wide data and what you will want to do is -reshape long-. But the details of the -reshape long- command will depend on the names of the variables, and those may require some cleaning up before you can actually do the -reshape-. Also, before you do the -reshape-, you will need to create a new variable that identifies the pairs, so that once the data is reshaped you will be able to tell which observation was matched with which other observation. So we really need to either have a data example that runs with the matching code, or we need to have an example of the data as it looks at the end of the matching code.

    All of that said, let me point out two peculiarities of your matching, which may or may not be problematic. The first is that you will likely end up matching the same US firm with different foreign firms in different years. The second is that if you have observations that agree on sic2 and f_year, and are also tied in terms of closeness of assets, this tie will be broken in a random and irreproducible way. Otherwise put, if you run the same code on the same data repeatedly, you will not get the same results each time.

    Comment

    Working...
    X