Sample Matching

Jeroen Kiep

Join Date: Feb 2017

Posts: 3
#1

Sample Matching

11 Feb 2017, 08:13

Hello!

I am trying to match my sample of restating companies (99 firm year observations) with non-restating companies (1731 obs) based on SIC, Size and Exchg (exchange code).
I am currently not able to perform the proper code for it, I've tried with joinby.

I would like to merge the databases of non-restating and restating firms. so that the new database has a restating and a non-restating observation based on size, SIC and Exchg.

I hope anyone can help!

Thanks in advance,

Jeroen
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30060
#2

11 Feb 2017, 09:37

Without seeing an example of your data it is hard to give specific advice. It would also be helpful to see the exact command you tried and in what way it did not produce the results you wanted. Sample matching usually does rely on -joinby-,but there are many ways to use it and without knowing what you tried, there is no way to say what was done wrong.

Finally, what are the matching criteria? Do size, SIC and Exchg all have to match exactly? Or is matching within a certain range permitted for some (or even all) of these variables?

In short, your question is too broad and vague to get an answer, but if you provide specifics, it is highly likely someone will be able to help you.
Comment
Jeroen Kiep

Join Date: Feb 2017

Posts: 3
#3

11 Feb 2017, 10:00

Thanks for your reaction!
We have created two datasets with the same variables in it. However, in one data set our restatement dummy is 0 (non-restatement; n=1781) and in the other it is 1 (restatement; n=99). We are interested in how to pair those observations in the best way we can, so we do not necessarily want to match them exactly.

We have tried joinby:

Code:

use sample_of_99_companies.dta, clear rename sic case_sic rename exchg case_exchg joinby case_sic case_exchg using sample_of_1781_companies.dta

This yields 1100 companies that only have the restatements and no companies that have no restatements.
when doing this the other way around (using 99 companies in joinby) gives us only ones, which indicates that our matching doesn't work.

What we actually want is a database that has 99 (or less because of the matching) paired companies with the same SIC and EXCHANGE codes, with one of them as a Restatement and the other one without.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30060
#4

11 Feb 2017, 10:25

Your question remains unclear, though less so. I'll fill in the gaps using my imagination--the result may or may not be useful to you.

sic and exchg are the attributes on which you want to pair the cases and controls. Each observation in the data set, in addition to having those two variables must also have some variable identifying the individual firm, and perhaps other such variables. Let's call it firm_id

So you need to do something like this:

Code:

use sample_of_1781_companies, clear rename firm_id control_id // HERE INSERT STATEMENTS RENAMING OTHER VARIABLES TO DISTINGUISH THEM AS CONTROL // BUT DO NOT RENAME sic OR exchg tempfile holding save `holding' use sample_of_99_companies, clear rename firm_id case_id // HERE INSERT STATEMENTS RENAMING OTHER VARIABLES TO DISTINGUISH THEM AS CASE // BUT DO NOT RENAME sic OR exchg joinby sic exchg using `holding', unmatched(master)

You will now have a data set in which each of the 99 case companies is paired with every control company that has the same sic and exchg values. Any cases that found no match will still be in the data set (but with missing values for the paired control variables). Controls that found no match will not appear in this data set.

The next step will be to whittle this down, as it is possible that some of your cases will have attracted more matches than you want or need. Your original post mentioned also matching on size. So if an exact match on size is wanted, the -joinby- command should include size along with sic and exchg. If you want to match on size within some range, then at this point you could use -keep if abs(case_size - control_size) < your_threshold_for_size_matching-. If further narrowing is needed, there are ways to do that, but without knowing what your data look like and not knowing what additional criteria you might want to apply for selection, I'll just stop here.
Comment
Jeroen Kiep

Join Date: Feb 2017

Posts: 3
#5

16 Feb 2017, 05:53

Thank you very much Clyde! I managed to work it out with your help!
Comment

Announcement

Comment

Comment

Comment

Comment