Difference-in-Difference regression - how to select the control group

Haavard Solli

Join Date: May 2018

Posts: 4
#1

Difference-in-Difference regression - how to select the control group

25 Jun 2018, 10:09

Hi,

I'm writing a master thesis on the effect a policy change has on tax avoidance. The rule only applied to about 500 observations (treatment group) in my dataset, which in total contains around 80000. I would like to identify a control group of equal size.
Relevant matching variables are the continuous variables X₁ - X₄ and the binary variables Z₁ - Z₂. The dependent variable in the succeeding difference-in-difference regression is the continuous variable Y.
Is it possible to perform an excact match on the independent binary variables and nearest-neighbor matching or equivalent on the continuous independent variables? I tried teffects nnmatch but couldn't figure out how to match without replacements.

Any help is greatly appreciated.
Tags: control group
Clyde Schechter

Join Date: Apr 2014

Posts: 30192
#2

25 Jun 2018, 12:12

This is going to prove difficult, and I'm not sure it's worth the trouble. Even if this data were just a single cross-section in time, depending on the distributions, exact matching on both dichotomous variables could prove difficult (although admittedly with nearly 80,000 potential controls this probably won't be a barrier.) But then you want to also match on four continuous variables. That's a lot of matching and you may find that many of your 500 observations have only poor quality matches, or none at all. In addition, your data are actually longitudinal. So it is unclear how to handle situations where potential control B is a good match for case A in the pre-policy change era but not a good match in the post-policy change era. And certainly, it is highly likely that the "nearest neighbor" for a given case will differ at different times. Moreover, "nearest neighbor" is undefined for more than one variable: the match that is nearest on X1 may not be the nearest on X2, for example.

Finally, I would add that there is no statistical reason to insist on matching without replacements.

Others may have differing opinions about these issues, and it would be nice to see a discussion here.
Comment
Haavard Solli

Join Date: May 2018

Posts: 4
#3

25 Jul 2018, 06:58

I am sorry for the very late reply. But thanks for your help Clyde. We did try to match with replacement, but as you point out, matching can be very complex, and not giving the results needed.
The policy change in question is an interest barrier rule hindering the tax deductibility of intra-company interest expenses. The rule only applied to firms with a net interest expense of more than 5 million. So instead of matching on variables, we selected a control group of observations with more than 5 million in gross interest expense not in the control group. The control group was of approximately the same size as the treatment group. Also, the parallel trend seems quite promising.
The rule was proposed in 2013, so we will end the pre-treatment period in 2012, omitting 2013. Our post-treatment period is 2014-2015 and our pre-treatment period is 2011-2012.

If you or any one else disagree with our opinion on the parallel trend assumption, please let med know. Clearly, they are not textbook-parallel, but we think they are close enough.

Attached Files
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30192
#4

25 Jul 2018, 12:31

I would agree that the parallel trend assumption looks good enough for practical purposes here.

I don't understand:

we selected a control group of observations with more than 5 million in gross interest expense not in the control group.

That statement appears to contradict itself.
Comment
Haavard Solli

Join Date: May 2018

Posts: 4
#5

26 Jul 2018, 02:27

My bad, meant to say ""not in the treatment group".

Thanks Clyde!
Comment

Announcement

Difference-in-Difference regression - how to select the control group

Comment

Comment

Comment

Comment