Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Difference in Difference with continuous treatment variable

    I want to see the effect of conflict which causes fatalities in a region on the performance of financial institutions. I have data from 2015 to 2022 for 8 institutions operating in 8 counties. Fatalities here refer to the total number of deaths caused by conflict. The conflict period was from 2010 to 2018. I have data on total number of deaths for each county for the conflict period. Fatalities caused by conflict till 2018 for each county is a treatment variable. I want to apply DiD with a fixed effect before and after 2018. The model I specified is as follows:



    Yit=β0 + β1Fatalitiesi + β2Postt + β3Fatalitiesi x Postt +Xit +αi +γt + ϵit

    where αi is the institution-fixed effect and γt is the time-fixed effect.

    Is the above model correct? What is the best way to conduct the analysis using DID with a fixed effect?​

  • #2
    Perhaps I am misunderstanding, but I think this is, while not strictly speaking incorrect, rather off kilter. The conflict period, if I understand you correctly, ends in 2018. So after 2018 the conflict caused fatalities variable will just be 0 in every county for every subsequent year. OK, there might be a small number of delayed deaths attributable to injuries incurred in conflict, but these numbers will be pretty small and probably only seen in 2019, so the conflict fatalities variable, if not exactly 0, will be almost 0 in all observations after 2028.

    So what you have here is the reverse of the usual DID design. Normally with DID, the treatment period comes after the no-treatment period. But in your situation the treatment period actually precedes the no-treatment period. So it is going to be a little strange to interpret the results.

    If I were you I would reconceptualize and reparameterize the model. You can do this in either of two ways.

    i) Instead of having a Postt variable, use a variable that indicates the conflict period (2015-2018). That way this time period variable will be 1 in the same observations where the "treatment" (conflict fatalities) is also 1.

    ii) Alternatively, think of the "treatment" as being the discontinuation of conflict fatalities and continue to use the Postt variable. But to do that you would need to transform the fatalities variable and replace it by something that is a decreasing function of fatalities. To choose an appropriate transformation you will need to graphically explore the relationship between Y and Fatalities. This second approach strikes me as less natural and probably not what I would choose. But I put it out there for you to consider.

    Added: Without seeing your data, I can only speculate about this. But unless there are some counties that experience no fatalities at least in some years during the conflict period, then this is really not a classic DID model because you have no untreated group, just more treated and less treated. And if there are counties that experience no fatalities, at least in some years during the conflict period, while others in the same years are experiencing fatalities, then what you have is a staggered entry. So you would, in this case, eliminate the beta1 and beta2 terms from the model.

    Also, I would think you would want to include a county effect in the model, as different counties may have overall different levels of bank performance based on their different economic situations and possibly different applicable laws and regulations. (Perhaps this is one of the things you had in mind for an Xit variable.)
    Last edited by Clyde Schechter; 22 Aug 2024, 21:26.

    Comment


    • #3
      Thank you for your insightful feedback on reparameterizing the DID model. I greatly appreciate your suggestions, especially regarding the conflict period indicator.

      I would like to clarify and seek further guidance on one aspect of my dataset. I do not have year-wise data on fatalities during the conflict period. Instead, I have total fatalities recorded from 2010 to 2018 for each county(e.g., County A has 15 total fatalities, County B has 5, and County C has 0).

      Given this limitation, how would you recommend I proceed with the DID analysis? Specifically:
      1. How should I incorporate these total fatalities into the model, given that they span the entire post-conflict period rather than being broken down annually during the conflict years?
      2. Would it be appropriate to treat the total fatalities as a continuous treatment variable in this context? If so, how might this affect the interpretation of the interaction term with the conflict period?
      Thank you very much in advance.

      Comment


      • #4
        So,let's clarify a few things. I realize that I need a much clearer understanding of your data structure.

        You have 8 financial institutions operating in 8 counties. Is this 1 financial institution in each county, or are there 8 institutions and each of them operates in all 8 counties, or are there 8 institutions and each of them operates in a subset of the counties (subset may be empty, a proper subset, or the set of all 8)?

        And now although the data reflects events occurring over the period from 2015 to 2022, in fact it is aggregated to the level of two periods: 2015-2018 (conflict) and 2019-2022 (no conflict). And so for each county, there are two observations: one for the conflict period and one for the no-conflict period--is that correct?

        Are there any non-zero fatality counts in any county during the non-conflict period? If so, are the counts more than just a handful?

        You have some kind of performance outcome measure for the institutions. Assuming that an institution operates in more than one county (or at least that some of them do), is this outcome measure defined separately for each county's operations of the institution, or do you have one overall measure for the institution across all 8 counties?

        Comment


        • #5
          Thank you for getting back to me!
          I want to clarify your queries.

          1. There are 8 financial institutions, each operating within a specific county. This means each county has one financial institution operating within it, and no institution operates in more than one county.

          2. Regarding data time period and aggregation your understanding is correct. There are two observations. One corresponding to the conflict period (2015-2018) and another to the post-conflict period (2019-2022).

          3. There are no non-zero fatality counts in any county during the post-conflict period (2019-2022). All fatality counts during this period are zero, as the conflict had ended.

          4. So, for the performance measures, one of them is for example ROI (Returm on Investment). ROI is defined separately for each institution within its respective county. Therefore, for each financial institution operating in a particular county, the ROI is calculated based on that institution's operations within that specific county. It is not an overall measure.

          Comment


          • #6
            Thank you. There is one question I neglected to ask: do all 8 counties experience fatalities during the conflict period, or were some of them spared? Since that was my mistake, I'm going to give you two commentaries here, one for a yes answer and the other for a no answer to this question.

            Assuming you can answer no: some counties were spared fatalities during the conflict period, you have the minimal elements of a DID design here. All four combinations of conflict era or pre-conflict era and some fatalities vs no fatalities are instantiated. Since each institution operates in only one county, there is no distinction between county and institution in the analysis, so no additional term for county is needed. The equation for the model would be Yit=β0 + β1Fatalitiesi + β2Postt + β3Fatalitiesi x Postt + ϵit. (Two-way fixed effects are not needed in this analysis. If you do include them, the β1Fatalitiesi and β2Postt terms will be colinear with those fixed effects and will drop out of the analysis--which is not actually problematic if you understand how interaction models are interpreted. The "treatment effect" estimated by β3 will be the same either way. Since the analysis using simple regression without the fixed effects is easier to interpret, I recommend that approach for convenience.)

            Assuming you must answer yes, all counties incurred fatalities during the conflict period, then you do not have the makings of a standard DID design because there is no instantiation of the conflict # no fatalities era. You can run a different kind of analysis such as Y=β0 + β1Fatalitiesit + αi +γt + ϵit.

            In either case, as I suggested in #2 it is probably better to think of the "treatment" here as the remission of conflict fatalities so that the pre- and post-treatment periods would be in normal chronological order.

            Notice that in neither case do I include Xit variables. The reason for this is that you have a total of only 16 data points--you do not have nearly enough data to support additional variables in this model and analyze them with any reasonable degree of precision. In fact, 16 data points is really too small to support even the bare-bones analyses I am showing here. Unless the effect is so massive that it would already have been readily observed and already be common knowledge or folklore, you will be severely underpowered. Consequently, you are unlikely to find a statistically significant effect even if a moderately size effect really exists, and, if you do obtain a statistically significant result from the analysis there is a very high probability that your results will grossly overestimate its magnitude and an unacceptably high probability that the sign will be in the wrong direction. So I am really not optimistic you can really answer this research question with this scanty data.

            At the risk of beating a dead horse, even if we disregard these other limitations, and if you are in a position to do a standard DID analysis, you have zero data available to test parallel trends in the absence of treatment, which is a crucial piece of identifying a causal effect. Going beyond this, to justify some kind of causal interpretation of your results you would need extensive evidence from other sources that rule out the possibility that something else happened between 2018 and 2019 that explains whatever differences you found. (Or, if you find no difference, you would need evidence that there wasn't something else that happened which is obscuring and overwhelming the effect you were trying to find.)

            Look, if this is just a practice project to gain early experience with DID analysis, then don't be deterred by all of this. Be aware of the limitations and elaborate upon them in whatever presentation of results you ultimately make. But if you are pursuing a master's thesis, doctoral dissertation, or a publication, I don't think the ingredients for that are here.

            Comment


            • #7
              Thank you so much for your comprehensive explanation. As someone new to econometrics with limited knowledge, your suggestions are incredibly helpful and provide much-needed clarity. I apologize for asking multiple questions.

              I wanted to provide an update and seek further advice based on your feedback. I am currently in the process of collecting additional data and am hopeful that I can gather information for 35 institutions. However, it seems only two of these institutions did not experience any fatalities during the conflict period.

              Additionally, I am looking to collect year-wise data for these institutions. My concern is that, since the conflict period is 2010-2018 and my study period is 2015-2022, I will only be using fatalities data for 2015-2018. From the preliminary information I have, it seems that the fatalities count in this period is quite low compared to the earlier years of the conflict.

              Given this context, I am wondering whether the relatively low fatalities count for 2015-2018 will still be sufficient to capture the effect I am trying to study. If I am able to collect this year-wise data, how would you suggest structuring the model in this case? Would the low fatalities count affect the robustness of the DID analysis, and is there a specific approach you would recommend?

              Thank you again for your time and guidance.


              Comment


              • #8
                However, it seems only two of these institutions did not experience any fatalities during the conflict period
                Two is not great, but it is better than the current zero. It at least puts you in the position where a DID analysis is possible.

                I am looking to collect year-wise data for these institutions.
                Excellent idea. Not only will it increase your sample size, it will give you a better exploration of variation in both the exposure and outcome variables, lending more strength to the regression.

                I will only be using fatalities data for 2015-2018. From the preliminary information I have, it seems that the fatalities count in this period is quite low compared to the earlier years of the conflict.

                Given this context, I am wondering whether the relatively low fatalities count for 2015-2018 will still be sufficient to capture the effect I am trying to study.
                ... Would the low fatalities count affect the robustness of the DID analysis, and is there a specific approach you would recommend?
                Well, I don't know. As a general statistical principal, restricting the range of a variable tends to reduce the strength of its associations with other variables. But whether it will so hamper your study as to make it infeasible, that is a substantive question. Has this question, or something very close to it, been studied perhaps in other time periods or locations? How does the range of fatalities in your (potential) data compare to what has been used in the past? Have other studies shown a strong "dose-response" relationship between fatalities and bank performance? If your study will be the first to explore the fatalities-bank performance relationship, then I think you have no choice but to gather the best data you can get your hands on and then see what the results are. It may or may not uncover the effect you are looking to find. If it doesn't, it might be because the effect is small and requires a massive amount of data to detect, or it might be because your particular data set happens to be too small to find it even though it is of moderate size. When you have finally nailed down what data is available to you, you probably should do a formal power analysis to see just what size the effect has to be in order to detect it with the data available to you. If you find that you are only powered to detect effects so large that they are implausible, then you should see if there is some way to modify it so that it has better prospects of success. If that is not feasible, then, if you are just doing this as a learning experience, proceed anyway, fully expecting to obtain inconclusive results, and bearing in mind that if you do find a statistically significant result, it is probably erroneous. If you are doing this for a thesis in a degree program or seeking publication in a good journal, then it would be better to abandon this particular study and find a more tractable research question to explore.

                If I am able to collect this year-wise data, how would you suggest structuring the model in this case?
                I would do it as a two way fixed effects model (institution and year) with a fatalities#conflict ear interaction term. If the additional institutions are operating in the same 8 counties as the original ones, then, you would ordinarily consider using standard errors clustered at the county level. BUT, your sample is too small to use clustered standard errors in any case--I'm just mentioning it because you now would no longer have independence of observations within counties and normally one tries to adjust for it in some ways. (Another possibility is using a random effects 3-level model, but I know those are frowned upon in econometrics.) If the additional institutions still leave you with only one institution in each county in the study (which entails that you now have more counties) then this is not an issue.

                Comment


                • #9
                  I extend my deepest thanks for your invaluable guidance throughout this process. Your insights have been incredibly helpful, especially considering my limited background in econometrics. You've clarified many complex aspects of my research, and I truly appreciate the time you've taken to address each of my questions. Thank you very much once again.

                  Comment

                  Working...
                  X