Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Isolating the effect of covid policy from the covid shock on digitalization

    For my public governance thesis, I'm running a panel data analysis to investigate how stringent COVID-19 measures influenced digitalisation progress across EU countries between 2017 and 2022. I've used fixed effects regressions (both entity and time effects), including economic controls and a lagged dependent variable. To explore the impact of the pandemic, I ran one model using an is_covid dummy (0 before 2020, 1 from 2020 onward), and another using avg_stringency (Oxford index of government restrictions 0-100). Both variables are naturally correlated, which makes it hard to determine whether digitalisation progress was driven by the general shock of the pandemic or by specific policy responses.

    What would be the best way to statistically isolate the unique contribution of policy stringency from the broader COVID-19 effect? Should I avoid including both variables in the same model due to multicollinearity, or is there a better way to decompose their effects? Right now, I am using 1 model with only entity effects on the dummy and another on the stringency index with entity and time effects. The dummy is positively significant but when I include time effects there are no significant values. Now I wonder if I can conclude that there (a) is no effect of policy on digitalization or (b) with the current statistical setup, I am not able to measure the effect because of the time effects that "eat" the significance.

  • #2
    First, let's talk about the use of year fixed effects and the is_covid variable. You can't properly use both. You have years 2017-2022. The variables for those 6 years and the constant term in your model (with or without an is_covid variable) are colinear: this is the standard "dummy variable trap." It is conventionally resolved by designating one of the years as the reference year, and omitting the indicator variable ("dummy") for that year. Now when you throw the is_covid variable into the model, a variable that indicates a subset of the years, you have re-established colinearity among the constant term, the year indicators other than the reference year and the is_covid variable. So yet another constraint must be imposed to identify the model. Depending on what command you use and exactly how you code it, Stata will either omit the is_covid variable or will omit another year indicator in addition to the one reference year. Regardless of which way it is resolved, you have to bear in mind that it changes the meaning of those variables. Just as in the absence of is_covid, the indicator variable for year does not represent the level in that year, but rather the difference between the level in that year and the reference year, when we now introduce another colinear variable, is_covid, the meaning changes again: none of the colinear variables means what its name says anymore. More important, there is no way to estimate an actual effect for any of the colinear variables. It is a matter of linear algebra that those effects are unidentifiable, and the imposition of constraints (omission of variables) serves only to assign arbitrary values to the coefficients of those variables. In fact there is a theorem that says you could make the coefficient of is_covid turn out to be any number you wish, and there will be a solution to the regression equations that supports that, adjusting the coefficients of the other colinear variables to make it work out. So you are going to have to give up either is_covid or the time fixed effects. You can't have them both in the same model.

    Next there is the question of separating the effects of the pandemic itself from that of the policies implemented in response to it. This is going to be very difficult, and I'm not sure it is even realistically possible. If it is possible, it's going to require an amazingly detailed data set. The correlation between the is_covid variable and the policy variable is going to depend on the particular policy variable used. What you need is a policy variable that is as uncorrelated to the is_covid variable as possible. Such a variable would be very different from just gradually increasing after the onset of the pandemic and then reaching some level and oscillating mildly up and down from there for a while, and then tapering away towards the end of the is_covid period. It would, instead, vary greatly during the 2020-2022 era, with wild ups and downs in all, or nearly all, countries. I doubt that an intensity index will do that. Most policy responses were initiated in the early months of 2020 and reached their peak intensity by mid-year, and remained in effect for 1-2 years, with considerable variation from jurisdiction to jurisdiction in peak intensity, and times of onset and offset, but probably relatively little variation within jurisdictions between the start and end of policy initiatives. I could be wrong about that, but that's my impression of how things worked. If I'm right, you are going to have pretty strong correlation between the is_covid variable and the policy intensity variable. And if that is the case, the only way you will be able to get estimates that clearly separate the two effects is to have a gigantic data set: many different jurisdictions and fine-grained time series of observations within them. If you have only a modest number of jurisdictions and data measuring policy intensity at widely spaced points in time, it is a lost cause.

    The only way you can really know if your data set is adequate to the task is to actually try it out. If the standard error of your policy intensity variable is small enough that the 95% confidence interval around its coefficient is narrow enough that you can say you have narrowed the effect down to a limited range of possibilities that's worth talking about, then you're good. Otherwise, you will have to judge your study as inconclusive.

    A model using only the policy stringency variable or only the is_covid variable is inherently incapable of separating the effects of policy responses from the effects of the pandemic itself. The two are confounded with each other, and your only hope of separating out their effects is to include both in the model. But, as explained in the previous 2 paragraphs, this is going to be difficult and probably will require a heroic data set to be successful.

    Comment


    • #3
      Clyde Schechter Thank you very much for your detailed answer! I’ve currently averaged the stringency index to the annual level since my main outcome variable is only available yearly. However, I do have access to the daily-level stringency data for each country, which contains more fluctuation. Would there be any way to make use of the daily data, even though my outcome is yearly?

      Comment


      • #4
        You could use the daily stringency data, pairing each daily observation with the outcome variable for that year. The extra daily variation in the stringency variable will reduce its correlation with in_covid (or, see next paragraph, the time effects). Of course, the use of a yearly outcome is less than ideal, as it reduces the correlation between the stringency variable and the outcome. It isn't possible to say in advance whether, on balance, you will be better off or worse off using the daily stringency data. But I would give it a try, and see which gives you the estimate of the stringency effect with the smaller standard error.

        I also want to add to what I said in #2. I said there that you cannot use a model that does not contain the is_covid variable. That is not correct. What I should have said is that you cannot use a model that does not contain the in_covid variable's information. But, due to the colinearity among the constant term, is_covid, and the yearly indicators, the time fixed effects do contain the in_covid variable's information. My admonition that you cannot use both the is_covid variable and the time fixed effects is correct--but you are not forced to choose the in_covid variable to adhere to that. You can omit the in_covid variable and use the time fixed effects instead. That will not get you out of the in_covid:stringency correlation difficulties--they become stringency:time effects correlation, and they have exactly the same consequences for isolating the effects of the policy measures. But it's possible that the year indicators also carry additional relevant information that will help clarify the policy effects. Of course, on the other hand, it is also possible that the additional information in the time effects will add noise to the analysis, and there is no way to tell in advance which way it will turn out. So, again, I would try it both ways. And, again, the model with the smallest standard error for the stringency coefficient is the one that best isolates the policy effects.

        Comment


        • #5
          Ratje:
          welcome to this forum.
          As an aside to Clyde's helpful guidance, why not considering -xtdidregress-?
          Kind regards,
          Carlo
          (Stata 19.0)

          Comment

          Working...
          X