Matching sample firms with firm size

Jennifer Xie

Join Date: Mar 2017

Posts: 12
#1

Matching sample firms with firm size

24 Sep 2018, 23:16

Hi all,

I have unbalanced panel data, I want to match firms based on whether they have state ownership (treatment group) or not (control group).

1. matched observations need to be in the same year and industry
2. matched observations need to have a similar firm size.

I checked -nnmatch, it doesn't fit my purpose as it doesn't reduce the sample size. Is there any code I can use to fit my purpose?

Can anyone help me out?

Thanks in advance,

Jennifer
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30111
#2

24 Sep 2018, 23:43

You don't give example data, so I can't respond with specific code. But the -rangejoin- command, written by Robert Picard, and available from SSC, will do this for you. The year and industry variables will go in the -by()- option. You'll have to make a specific quantitative definition of what you consider "similar" firm size, and then that will guide you in specifying the high and low range around the firm size variable.

So -ssc install rangejoin-, read the -help rangejoin-, and you'll be able to do this.

If you have the case and control firms in the same data set, you will need to first create separate data sets for them.

I don't understand the reference to reducing the sample size. Matching is not about sample size reduction (though it often has that consequence due to inability to find matches for some observations). If you want, after matching each treated case with all compatible controls (which is what -rangejoin- does), you can follow that by keeping just one control for each case. You might pick one at random, or you might pick one that matches most closely on size, or something like that.
1 like
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30111

25 Sep 2018, 08:45

Well, you are now changing the problem specification. You originally said you wanted the matched firm to have a "similar" size. Now you want "nearest firm size." It is entirely possible that the nearest firm size will be very different. If you really meant "similar," then you have to decide how close the firm sizes need to be for you to consider them similar. That is a judgment call that you have to make: it is not a statistical issue or a coding issue, so you won't find anything about it in the help files or documentation or textbooks. It requires your thinking about the real-world implications of size differences and making your own decision (perhaps with input from knowledgeable colleagues) what is the biggest difference in size that would be acceptable for your purposes.

That said, I will assume that you really mean "nearest" and not "similar." In this case, you do not need -rangejoin-. You do this, instead, with -joinby-. The code looks like this:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte stkcd int year byte(state industry) float(roa stockreturn size)
1 2003 1 10 .001911  -.17534 25.98518
1 2004 1 10  .00244  -.22562 26.04279
1 2005 1 10 .002944  -.06829 26.15793
1 2006 1 10 .007311 1.356678 26.28616
1 2007 1 10 .010558 1.935114 26.58843
1 2008 0 10 .001693  -.68042  26.8854
1 2009 0 10 .010478  1.57611 27.09967
1 2010 0 10 .010794  -.35207 27.31248
1 2011 0 10 .010438  -.01267 27.86068
1 2012 0 10 .010919  .035233  28.1051
1 2013 0 10 .010548  .235097 28.26852
1 2014 0 10 .012004   .57298  28.4133
2 2003 1 11 .077315  .375402 23.08044
2 2004 1 11 .081065  .214562 23.46632
2 2005 1 11 .089048  .267021 23.81396
2 2006 1 11 .069863 2.671332 24.60499
2 2007 1 11 .076457 1.832039 25.32938
2 2008 1 11  .05338  -.64005 25.50438
2 2009 0 11 .063114   .68349 25.64768
2 2010 0 11 .055161  -.23234 26.09686
2 2011 0 11 .053217  -.07974 26.41433
2 2012 0 11 .055472  .373709 26.66028
2 2013 0 11 .050628  -.19432 26.89539
2 2014 0 11 .049132  .827567 26.95455
4 2003 1  3 .019811  -.43972 19.28726
4 2004 1  3  -.0351  -.18735 19.18275
4 2005 1  3 -.14043  -.39207 18.86452
4 2006 1  3  .02656  .237922 19.13512
4 2007 1  3 -.05664 1.526439 18.95026
4 2008 1  3 -.06166  -.64257 18.93835
4 2009 1  3 .036718 1.808989 19.25581
4 2010 0  3 .134181     .204 19.03164
4 2011 0  3 .063131  -.32807 19.09519
4 2012 0  3  .07259  .011125 19.07775
4 2013 0  3  .04744  .424205 19.30508
4 2014 0  3 .011249  .336481 19.63939
5 2003 0 11 -.02676  -.41447 21.30033
5 2004 0 11 -.04513  -.17603 21.31693
5 2005 0 11 -.13437  -.25455 21.22032
5 2006 1 11 .003561 1.505181 21.14742
5 2007 0 11 .041095 1.855422 21.09966
5 2008 0 11 -.02478   -.6512 21.05187
5 2009 0 11 -.06154 1.427419 21.00029
5 2010 0 11  -.0082  -.39037  20.9807
5 2011 0 11 -.05254  .051771 20.95484
5 2012 0 11 -.00352  -.23057 21.00067
5 2013 0 11 -.03762  -.15825 20.89609
5 2014 0 11 .039158      .64 21.03564
6 2003 0 11 -.06584  -.34915 22.10159
6 2004 0 11 .005077  -.21333 22.11468
6 2005 0 11 .051442   .05373 21.66056
6 2006 1 11 .085447 2.864295 21.79448
6 2007 1 11 .051185  .974543 22.44038
6 2008 1 11 .026885  -.62143 22.50185
6 2009 0 11 .048961 1.312869  22.7073
6 2010 0 11  .07419  -.06441 22.87134
6 2011 0 11 .067721  -.22895 22.84217
6 2012 0 11 .090236  .580172 22.95794
6 2013 0 11 .092677  .069082 23.02266
6 2014 0 11 .057115  .477997 23.18739
7 2003 1  8  .04041  -.39567 20.83452
7 2004 1  8 -.08235  -.28842 20.77055
7 2005 1  8  .00169  -.04734 20.56578
7 2006 1  8 -.08505   .14236 20.50744
7 2007 0  8 -.00129 2.107438 20.41835
7 2008 0  8 -.10249  -.56915 20.23004
7 2009 0  8 -.17538 1.169753 19.71066
7 2010 0  8 -.03078  .216216 19.62191
7 2011 0  8 .026548  -.23041  19.7083
7 2012 0  8 .011897 1.199088 20.48714
7 2013 0  8 .023996  -.04077 20.30397
7 2014 0  8 -.04221  .051873 20.33896
end

//    FIRST SEPARATE THE STATE = 0 AND STATE = 1 OBSERVATIONS
preserve
keep if state == 0
ds year industry, not
rename (`r(varlist)') st0_=
tempfile state0
save `state0'

restore
keep if state == 1
ds year industry, not
rename (`r(varlist)') st1_=
tempfile state1
save `state1'

//    FIND ALL ADMISSIBLE PAIRINGS
joinby year industry using `state0'

//    KEEP ONE WITH CLOSEST SIZE
gen size_diff = abs(st0_size - st1_size)
by st1_stkcd year (size_diff), sort: keep if _n == 1

At this point you will have each firm with state = 1 in any year paired with a state = 0 firm in the same industry and in the same year. The chosen firm will be one whose size is nearest to that of the state = 1. Note, however, two issues:

1. In your example data, at least, there are several state = 1 firm/year combinations for which no suitable state = 0 match can be found.

2. Although I don't think this happens in your example data, in your real data it is possible that there will be two or more state = 0 firms that both have the same size and are both nearest to the size of a state = 1 firm. The code above will break that tie in a random and irreproducible way. If that is not acceptable to you, you need to decide how you want to break ties of this nature if they occur.

In the future, when showing data examples, please use the -dataex- command to do so. If you are running version 15.1 or a fully updated version 14.2, it is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

When asking for help with code, always show example data. When showing example data, always use -dataex-.

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30111
#4

26 Sep 2018, 08:25

Can I simply replace -joinby- with -merge 1:m- in the above codes?

No. If you try that Stata will tell you that industry and year do not uniquely identify observations in the using data.

I can see that -joinby- is a one to one match,

No, that's not true. -joinby- is a many-to-many match. You ended up with a one-to-one match because of the later command

Code:

by st1_stkcd year (size_diff), sort: keep if _n == 1

That command was put there because you had said you wanted to match with the closest size, and the use of the word the in this context implies choosing just one. Perhaps you really meant what you said in #1, where you want "similar" size matches, not the closest. That can be done, but, again, you have to decide what "similar" means to do that. To accomplish that you would just change the line in the code block to:

Code:

by st1_stkcd year, sort: keeep if logical_expression_that_operationalizes_"similar_size"

That will leave you with each firm matched with however many allowable matches there are.
Comment

Announcement