Fixed-effects and coarsened exact matching

Rens Eggink

Join Date: Jun 2021

Posts: 23
#1

Fixed-effects and coarsened exact matching

22 May 2022, 09:32

Dear all,

I successfully matched my sample with the CEM method of Iacus et al. (2012). However, my thesis supervisor requires me to now use fixed effects in my regression. Unfortunately, I get errors using the following code:

Code:

xtreg YieldtoMaturity green Liquidity, fe if time_period==0 [iweight = cem_weights], robust

Could someone please explain what I am doing wrong?

Thank you a lot in advance!!

P.S. Please forgive me if I did not adhere to the forum's format, I am relatively new to Stata.

Best regards,

Rens
Tags: fixed effects
Øyvind Snilsberg

Join Date: Oct 2021

Posts: 591
#2

22 May 2022, 10:05

if you look at the help file for xtreg you can see that the syntax is -xtreg depvar [indepvars] [if] [in] [weight] , fe [FE_options]-
Comment
Rens Eggink

Join Date: Jun 2021

Posts: 23
#3

23 May 2022, 01:17

Hi Øyvind, thank you a lot for your reply.

I noticed that iweight is not possible with fixed effects. I am trying to fix that now.
I will come back to you if I cannot figure it out myself.

Best, Rens
Comment

Øyvind Snilsberg

Join Date: Oct 2021
Posts: 591

23 May 2022, 01:46

you can use aweight,

Code:

xtreg YieldtoMaturity green Liquidity if time_period==0 [aweight=cem_weights], fe robust

reference: page 537 in Blackwell, M., Iacus, S., King, G. and Porro, G. (2009). cem: Coarsened exact matching in Stata, The Stata Journal, 9(4), pp. 524-546.

Code:

cem age education black nodegree re74, tr(treated)

Matching Summary:
-----------------
Number of strata: 205
Number of matched strata: 67

             0    1
      All  425  297
  Matched  324  228
Unmatched  101   69


Multivariate L1 distance: .46113967

Univariate imbalance:

                 L1      mean       min       25%       50%       75%       max
      age    .13641   -.17634         0         0         0         0        -1
education    .00687    .00687         0         0         0         0         0
    black   3.2e-16  -2.2e-16         0         0         0         0         0
 nodegree   5.8e-16   4.4e-16         0         0         0         0         0
     re74    .06787    34.438         0         0    492.23    39.425    96.881

reg re78 treated [iweight=cem_weights]

      Source |       SS           df       MS      Number of obs   =       552
-------------+----------------------------------   F(1, 550)       =      3.15
       Model |   128314324         1   128314324   Prob > F        =    0.0766
    Residual |  2.2420e+10       550  40764521.6   R-squared       =    0.0057
-------------+----------------------------------   Adj R-squared   =    0.0039
       Total |  2.2549e+10       551  40923414.2   Root MSE        =    6384.7

------------------------------------------------------------------------------
        re78 | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
     treated |   979.1905   551.9132     1.77   0.077    -104.9252    2063.306
       _cons |    4919.49   354.7061    13.87   0.000     4222.745    5616.234
------------------------------------------------------------------------------

reg re78 treated [aweight=cem_weights]
(sum of wgt is 552)

      Source |       SS           df       MS      Number of obs   =       552
-------------+----------------------------------   F(1, 550)       =      3.15
       Model |   128314324         1   128314324   Prob > F        =    0.0766
    Residual |  2.2420e+10       550  40764521.6   R-squared       =    0.0057
-------------+----------------------------------   Adj R-squared   =    0.0039
       Total |  2.2549e+10       551  40923414.2   Root MSE        =    6384.7

------------------------------------------------------------------------------
        re78 | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
     treated |   979.1905   551.9132     1.77   0.077    -104.9252    2063.306
       _cons |    4919.49   354.7061    13.87   0.000     4222.745    5616.234
------------------------------------------------------------------------------

Last edited by Øyvind Snilsberg; 23 May 2022, 01:54.

Comment

Rens Eggink

Join Date: Jun 2021

Posts: 23
#5

23 May 2022, 07:14

Hi Øyvind, thank you so much for your help, I appreciate it a lot.

I will run your code later today and will keep you posted on whether it finally worked.

Best, Rens
Comment
Nabila Biju

Join Date: Nov 2016

Posts: 4
#6

01 Jun 2022, 11:57

Hello Everyone!

I'm facing the same problem. I'm a new learner to matching techniques and want to do matching for my analysis.

I have a large panel dataset and I'm supposed to match the treatment and control groups based on their demographics and then run a different regression interacting the treatment with some of the explanatory variables rather than just getting the ATT or ATE which the commands cem or psmatch2 or teffects provide.

I am not being able to find resources which allows me to run the xtreg using the weights (as xtreg is not allowing me to use weights). Also, while doing the matching, for many observations, one household in year 2010 is being matched with another control household in 2014 for example.

If anyone can help me on how I can match for panel data taking consideration of the household per year setting and also how I can use the matching to later use in another specification of regression rather than just getting the simple ATT/ ATE that those command provide, I'll be extremely grateful!
Thank you so much in advance!

Best wishes,
Nabila
Comment
Maxence Morlet

Join Date: Mar 2021

Posts: 653
#7

02 Jun 2022, 05:35

Nabila Biju cem and psmatch2 are quite different commands, which will give you different results. The first one is coarsened exact matching, the second one is propensity score matching. What is your objective and what do your data look like?
Comment
Nabila Biju

Join Date: Nov 2016

Posts: 4
#8

04 Jun 2022, 18:20

Hello Maxence Morlet! Thank you so much for reaching out!

I have a panel data of consumption at household (HH) level and I want to match the households based on their demographic characteristics. My treatment is at state level where some states practice a certain law and the other states do not. So, I want to match the HHs that are in states practicing that law with the HHs in states who do not practice the law based on their demographics for example having certain educational degree, income etc.

I'm aware that those two commands do very different things. I would like to try both the matching techniques but with the panel data, I'm getting different weights for the same HH at different years. On top of that, psmatch2 is assigning the HH #100 for example to HH #456 in 2008 but to HH #367 in 2009.

In my little understanding, I'm assuming the weight or matching will remain same for each HH over the years. Please correct me if I'm wrong and any kind of guidance will be extremely helpful!
Thank you so much!!
Comment
Maxence Morlet

Join Date: Mar 2021

Posts: 653
#9

05 Jun 2022, 02:03

Jared Greathouse please correct me if I'm wrong, but I see two options here:

- The community contributed

Code:

ebalance

command, which can match moments of the distributions of variables between treatment and control, and performs significantly better than PSM in Monte Carlo simulations: https://web.stanford.edu/~jhain/Paper/JSS2013.pdf.

- Synthetic DID as in Arkhangelsky et al. (2021): https://econpapers.repec.org/softwar...de/s459058.htm. This also has several advantages.

But Jared Greathouse will surely give his expert opinion on this topic.
Comment
Jared Greathouse

Join Date: Sep 2021

Posts: 2172
#10

05 Jun 2022, 04:30

Hey Nabila Biju. So in my opinion, the technique to use can be heavily dependent on how many time periods and units you have. As Athey and her people discuss here, there's this idea of a fat vs thin matrix, the former being where we have many more time periods than units, the latter being when we have many more units than time periods. It's also possible to have about the same, and we have techniques for either (matching in the former, synthetic controls in the latter).

I will admit that I only know the basics of matching. I get why we'd want to do it and the situations it's useful, but I've never had to really read the CEM papers, and I know nothing on entropy balancing.

I say all this to say, what case is yours? Do you have a relatively long time series (longer than say, 12 periods for every unit), or do you have not many time periods but many units to choose from? If you've got many time periods, then as Maxence Morlet suggests, the synth DD by Damian Clarke and Daniel PV could potentially be a good fit for you (ssc inst sdud, replace). But either way, you must make the decision on the estimator to use based off theoretical stats and practical considerations unique to you.
Comment

Announcement

Fixed-effects and coarsened exact matching

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment