Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problems specifying difference-in-differences model and interpreting coefficients

    Dear Statalist,

    I am writing with a few questions about difference-in-differences. (Note: I am using Stata 17)

    I am using this design for my bachelor's thesis to measure the impact TBTF reforms have had on SRISK of systemically important banks in the EU. I have an unbalanced panel dataset with quarterly data on EU-based banks, ranging from 2002q1 to 2023q2. My dependent variable of interest is the banks' SRISK, provided by NYU V-Lab. The other variables of importance for this thread are gsib (dummy variably indicating whether a bank is identified as a G-SIB), olb (standing for other large bank, dummy variable identifying whether a bank has an overall exposure measure above €200 Bn and is as such subject to disclose indicators for G-SIB score calculation), above100 (dummy variable indicating whether a bank has quarterly total assets above €100 Bn), and above10 (dummy variable indicating whether a bank has quarterly total assets above €10 Bn). These essentially range in size from largest to smallest and are exclusive, so a bank with above100==1 will have above10==0. This is to identify separate groups, so there are three groups of decreasing comparability which can be used as a control groups for the DID. See here an example of my data:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int(quarter id) double srisk str15 country str70 name int year str11 tickerlower double qta int quarter_num byte(gsib olb active above100 above10)
    227 1 13816.010233704294 "Netherlands" "ABN AMRO Group NV" 2016 "abn:na" 416079.876 227 0 1 1 0 0
    228 1 10323.323091997687 "Netherlands" "ABN AMRO Group NV" 2017 "abn:na" 446855.867 228 0 1 1 0 0
    229 1 12242.359673297406 "Netherlands" "ABN AMRO Group NV" 2017 "abn:na" 460575.767 229 0 1 1 0 0
    230 1 7307.9098938904535 "Netherlands" "ABN AMRO Group NV" 2017 "abn:na" 481805.592 230 0 1 1 0 0
    231 1  7774.892195887378 "Netherlands" "ABN AMRO Group NV" 2017 "abn:na" 472119.714 231 0 1 1 0 0
    end
    format %tq quarter
    label values id id
    label def id 1 "ABN AMRO Group NV", modify

    I am using the following model specification to implement the DID analysis:

    Y_it =α_i+γ_t+β_1⋅g_i⋅τ_1t + β_2⋅g_i⋅τ_2t + β_3⋅g_i⋅τ_3t + ε_it

    where:
    Yit is the SRISK measure of bank i in quarter t, αi is the bank-fixed effect of bank i, γt is the time-fixed effect of quarter t and gi is a dummy variable indicating whether a bank is a G-SIB. τ1t , τ2t , and τ3t are dummy variables with a value of 1 for the pre-crisis period (2002-2007), the crisis period (2008-2013), and the post-Covid-19 period (2020-2023), respectively, and 0 otherwise. The post-crisis-post-reform period (2014-2019) is omitted and serves as a reference period. The βk are the DID estimates. This model is based on the paper by Ichiue et al. (2021) (https://dx.doi.org/10.2139/ssrn.4019583).

    I implement this in Stata using the following command (this is for the case where I use OLBs as a benchmark; the first three independent variables are the interaction terms, so the g_it * τ_kt):

    Code:
    xtreg srisk pre_crisis_gsib crisis_gsib covid_gsib i.quarter if gsib==1 | olb==1, fe vce(cluster id)
    At this point, I have a couple of questions about the model. The model is not as simple as the introductory DID models, so I was unable to find a satisfactory answer in any of the introductory econometrics textbooks I have looked into. I was also not able to find answers applicable to my case on Statalist or other forums.

    1. Am I correct in excluding the main effects of g_it and τ_kt, and only including the interaction effect?
    2. Am I correct in including quarter dummies?
    Should there not be a problem of multicollinearity given that the τ_kt mark time periods? I remember reading at a few points on Statalist that the time dummies should be excluded in such cases as otherwise the beta estimates are not informative. Further, the paper by Ichiue et al. (2021) referenced previously does not include them in their model.
    3. Is this still really a difference-in-difference model? There are two versions of the paper by Ichiue et al. (2021) available, one is a working paper at the Bank of Japan, and the next (and later version) is the one I linked above, posted on SSRN. In their first paper, they refer to this as a difference-in-differences model, whereas in the subsequent version, they simply refer to it as a linear panel regression. So am I really getting DID estimates by specifying the model in this way, and am I correct to interpret the coefficients as relative to the post-crisis period, given that I omit the interaction term for that period from the model (and there every data point falls into one of four specified periods.

    On top of that, I have some concerns about understanding the estimates. From what I understand, estimating a positive β_2, let's say 10000, would mean that on average, G-SIBs' SRISK declined by 10000 more between the crisis and post-crisis period as compared to OLBs. However, when benchmarked against above100 or above10 banks, the estimated coefficients for the interacted effects do not change when the model is estimated without quarter dummies (though it does change slightly when those are included). This is counterintuitive to me, as now if the estimated β_2 is still 10000, that now means that G-SIBs' SRISK declined by 10000 more between the crisis and post-crisis period as compared to above100/above10 banks. But these groups of banks have very different measures of SRISK, so I would expect to see different estimates. The esttab output can be seen in the following table. The first table is benchmarked against OLBs, the second against above100, and the third against above10. The first column presents the betas estimated without including quarter dummies, whereas in the second column, they were included.

    Code:
     esttab a1 a2 using results.doc, replace se drop(*.quarter* _cons) nomtitles type compress star(* 0.1 ** 0.05 *** 0.01) scalars(vce r2) coeflabels(pre_crisis_gsib β1 crisis_gsib β2 covid_gsib β3)
    
    ------------------------------------
                     (1)          (2)  
    ------------------------------------
    β1          -36756.2***  -29447.0***
                (5461.0)     (6176.5)  
    
    β2           19261.2***   11527.8  
                (6807.6)     (7170.2)  
    
    β3           12455.2**     7062.0  
                (4670.5)     (4944.9)  
    ------------------------------------
    N               2247         2247  
    vce          cluster      cluster  
    r2             0.498        0.574  
    ------------------------------------
    Standard errors in parentheses
    * p<0.1, ** p<0.05, *** p<0.01
    (output written to results.doc)
    
    . esttab b1 b2 using results.doc, append se drop(*.quarter* _cons) nomtitles type compress star(* 0.1 ** 0.05 *** 0.01) scalars(vce r2) coeflabels(pre_crisis_gsib β1 crisis_gsib β2 covid_gsib β3)
    
    ------------------------------------
                     (1)          (2)  
    ------------------------------------
    β1          -36756.2***  -32493.9***
                (5473.8)     (5849.2)  
    
    β2           19261.2***   14386.4*  
                (6823.6)     (7263.9)  
    
    β3           12455.2**     8736.8*  
                (4681.4)     (5061.5)  
    ------------------------------------
    N               1906         1906  
    vce          cluster      cluster  
    r2             0.533        0.581  
    ------------------------------------
    Standard errors in parentheses
    * p<0.1, ** p<0.05, *** p<0.01
    (output written to results.doc)
    
    . esttab c1 c2 using results.doc, append se drop(*.quarter* _cons) nomtitles type compress star(* 0.1 ** 0.05 *** 0.01) scalars(vce r2) coeflabels(pre_crisis_gsib β1 crisis_gsib β2 covid_gsib β3)
    
    ------------------------------------
                     (1)          (2)  
    ------------------------------------
    β1          -36756.2***  -35165.6***
                (5386.8)     (5404.1)  
    
    β2           19261.2***   18826.9***
                (6715.0)     (6732.4)  
    
    β3           12455.2***   11810.8**
                (4607.0)     (4641.9)  
    ------------------------------------
    N               6299         6299  
    vce          cluster      cluster  
    r2             0.537        0.552  
    ------------------------------------
    Standard errors in parentheses
    * p<0.1, ** p<0.05, *** p<0.01
    (output written to results.doc)

    It is puzzling to me how the estimated betas can stay the same, only the standard errors change, when benchmarked against a completely different set of banks with very different data. So building upon my previous questions:

    4. Am I interpreting the coefficients incorrectly? What would the correct interpretation be?
    5. Why would it make sense to expect the same estimated betas but just different standard errors using different control groups?
    6. Does it make sense to use different control groups in this way with the DID design? It makes a lot of sense for my thesis (which is interdisciplinary with economics and law, and as these groups are to a different degree similarly affected by regulation as G-SIBs, one would expect to find an increasing impact of the reforms when compared to a less and less comparable control group (which I do find)), but does it make sense from an econometric perspective?


    Any help would be much appreciated!
    Last edited by Rok Milavec; 15 Jun 2023, 10:18.

  • #2
    Rok:
    why not taking a look at -xtdidregress- Stata built-in command?
    Last edited by Carlo Lazzaro; 16 Jun 2023, 01:07.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment

    Working...
    X