Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problem of Collinearity in DiD model

    Hello,

    I'm attempting a DiD model for the following research question: "Effect of the Pandemic on CO2 emissions from Road Transport in India". I'm using daily emissions data for the sample period: 1st January 2019 to 30th April 2021. The CO2 emission variable is CO2gt and I have created a log version "ln_CO2gt".

    The model is as follows:

    I have created two dummy variables: Lockdown and Treatment. The var Lockdown has a value of 1 for the national lockdown imposed on 24th March 2020 to 31st May 2020 (and 0 otherwise). The var Treatment accounts for restrictions that were put in place before and after the lockdown. It has a value of 1 for the following dates: 1st March 2020 to 30th April 2021 (and 0 otherwise). I created a dummy variable for the interaction term (Lockdown * Treatment) called Interaction_term.

    I do get results for
    Code:
    reg ln_CO2gt Lockdown Treatment
    But, when I do
    Code:
    reg ln_CO2gt i.Lockdown##i.Treatment
    , Stata says the following:

    note: 0b.Lockdown#0b.Treatment identifies no observations in the sample.
    note: 1.Lockdown#1.Treatment omitted because of collinearity.

    I've been trying for a while to resolve this and I haven't been able to. Request you to help me out. I don't know where the issue lies (I'm inexperienced with Stata). Any guidance will be greatly appreciated.

    Thanks & Regards,
    Shashwat Raut
    MA Public Policy

    P.S:
    Here is an example of the data:
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str9 country int date str22 sector double CO2gt float(Lockdown Treatment ln_CO2gt Interaction_term)
    "India" 21550 "Ground Transport"  .81621 1 0  -.2030836 0
    "India" 21551 "Ground Transport" .834113 1 0  -.1813864 0
    "India" 21552 "Ground Transport" .841574 1 0 -.17248133 0
    "India" 21553 "Ground Transport" .842256 1 0 -.17167127 0
    "India" 21554 "Ground Transport"   .8345 1 0 -.18092254 0
    "India" 21555 "Ground Transport" .783472 1 0 -.24401996 0
    "India" 21556 "Ground Transport" .851924 1 0 -.16025797 0
    "India" 21557 "Ground Transport" .849987 1 0  -.1625342 0
    "India" 21558 "Ground Transport"  .85244 1 0 -.15965246 0
    "India" 21559 "Ground Transport" .852093 1 0  -.1600596 0
    "India" 21560 "Ground Transport" .824344 1 0 -.19316736 0
    "India" 21561 "Ground Transport" .801485 1 0 -.22128902 0
    "India" 21562 "Ground Transport" .827079 1 0 -.18985507 0
    "India" 21563 "Ground Transport" .845832 1 0  -.1674345 0
    "India" 21564 "Ground Transport" .852258 1 0 -.15986598 0
    "India" 21565 "Ground Transport" .862473 1 0 -.14795144 0
    "India" 21566 "Ground Transport" .855752 1 0 -.15577467 0
    "India" 21567 "Ground Transport" .860798 1 0  -.1498954 0
    "India" 21568 "Ground Transport" .856241 1 0  -.1552034 0
    "India" 21569 "Ground Transport" .813786 1 0 -.20605785 0
    "India" 21570 "Ground Transport" .860982 1 0  -.1496817 0
    "India" 21571 "Ground Transport" .863223 1 0 -.14708222 0
    "India" 21572 "Ground Transport" .869637 1 0  -.1396794 0
    "India" 21573 "Ground Transport" .862504 1 0  -.1479155 0
    "India" 21574 "Ground Transport" .868498 1 0 -.14098999 0
    "India" 21575 "Ground Transport" .762766 1 0   -.270804 0
    "India" 21576 "Ground Transport" .812455 1 0 -.20769475 0
    "India" 21577 "Ground Transport" .857439 1 0 -.15380524 0
    "India" 21578 "Ground Transport" .861785 1 0 -.14874946 0
    "India" 21579 "Ground Transport" .860553 1 0 -.15018007 0
    "India" 21580 "Ground Transport" .857454 1 0 -.15378775 0
    "India" 21581 "Ground Transport" .860215 1 0 -.15057293 0
    "India" 21582 "Ground Transport" .854786 1 0 -.15690413 0
    "India" 21583 "Ground Transport" .809818 1 0 -.21094576 0
    "India" 21584 "Ground Transport"  .85989 1 0  -.1509508 0
    "India" 21585 "Ground Transport" .859365 1 0 -.15156153 0
    "India" 21586 "Ground Transport" .863554 1 0 -.14669885 0
    "India" 21587 "Ground Transport" .866625 1 0 -.14314893 0
    "India" 21588 "Ground Transport" .867437 1 0 -.14221239 0
    "India" 21589 "Ground Transport" .856717 1 0 -.15464763 0
    "India" 21590 "Ground Transport" .814076 1 0 -.20570154 0
    "India" 21591 "Ground Transport" .856122 1 0  -.1553424 0
    "India" 21592 "Ground Transport" .856513 1 0  -.1548858 0
    "India" 21593 "Ground Transport" .862477 1 0 -.14794679 0
    "India" 21594 "Ground Transport" .864447 1 0 -.14566529 0
    "India" 21595 "Ground Transport" .864512 1 0  -.1455901 0
    "India" 21596 "Ground Transport" .848336 1 0  -.1644785 0
    "India" 21597 "Ground Transport" .805526 1 0  -.2162598 0
    "India" 21598 "Ground Transport" .853544 1 0  -.1583582 0
    "India" 21599 "Ground Transport" .857997 1 0 -.15315467 0
    "India" 21600 "Ground Transport" .848999 1 0 -.16369727 0
    "India" 21601 "Ground Transport" .856793 1 0 -.15455893 0
    "India" 21602 "Ground Transport" .860357 1 0 -.15040787 0
    "India" 21603 "Ground Transport" .847557 1 0  -.1653972 0
    "India" 21604 "Ground Transport" .789936 1 0 -.23580335 0
    "India" 21605 "Ground Transport" .858492 1 0  -.1525779 0
    "India" 21606 "Ground Transport" .853158 1 0 -.15881053 0
    "India" 21607 "Ground Transport"  .85647 1 0   -.154936 0
    "India" 21608 "Ground Transport" .816053 1 0   -.203276 0
    "India" 21609 "Ground Transport" .856556 1 0  -.1548356 0
    "India" 21610 "Ground Transport" .853911 1 0  -.1579283 0
    "India" 21611 "Ground Transport" .804152 1 0 -.21796697 0
    "India" 21612 "Ground Transport"  .82081 1 0  -.1974636 0
    "India" 21613 "Ground Transport" .859263 1 0 -.15168023 0
    "India" 21614 "Ground Transport" .861347 1 0 -.14925784 0
    "India" 21615 "Ground Transport" .860789 1 0 -.14990588 0
    "India" 21616 "Ground Transport" .863031 1 0 -.14730467 0
    "India" 21617 "Ground Transport" .844829 1 0 -.16862103 0
    "India" 21618 "Ground Transport" .795976 1 0 -.22818625 0
    "India" 21619 "Ground Transport" .856549 1 0 -.15484375 0
    "India" 21620 "Ground Transport" .857009 1 0 -.15430686 0
    "India" 21621 "Ground Transport" .858947 1 0 -.15204805 0
    "India" 21622 "Ground Transport" .858469 1 0  -.1526047 0
    "India" 21623 "Ground Transport" .862287 1 0 -.14816712 0
    "India" 21624 "Ground Transport" .854299 1 0 -.15747403 0
    "India" 21625 "Ground Transport" .804889 1 0  -.2170509 0
    "India" 21626 "Ground Transport"  .85787 1 0  -.1533027 0
    "India" 21627 "Ground Transport" .861115 1 0  -.1495272 0
    "India" 21628 "Ground Transport" .837444 1 0  -.1774009 0
    "India" 21629 "Ground Transport" .238581 1 0 -1.4330465 0
    "India" 21630 "Ground Transport" .800987 1 0 -.22191057 0
    "India" 21631 "Ground Transport"  .82781 1 0 -.18897162 0
    "India" 21632 "Ground Transport" .775076 1 0 -.25479418 0
    "India" 21633 "Ground Transport" .854318 1 0  -.1574518 0
    "India" 21634 "Ground Transport" .855706 1 0  -.1558284 0
    "India" 21635 "Ground Transport" .856803 1 0 -.15454726 0
    "India" 21636 "Ground Transport"  .85508 1 0 -.15656024 0
    "India" 21637 "Ground Transport"  .85772 1 0  -.1534776 0
    "India" 21638 "Ground Transport"  .84991 1 0  -.1626248 0
    "India" 21639 "Ground Transport" .813753 1 0  -.2060984 0
    "India" 21640 "Ground Transport" .853824 1 0  -.1580302 0
    "India" 21641 "Ground Transport"  .84766 1 0 -.16527566 0
    "India" 21642 "Ground Transport" .851417 1 0 -.16085325 0
    "India" 21643 "Ground Transport" .850397 1 0 -.16205198 0
    "India" 21644 "Ground Transport" .852294 1 0 -.15982375 0
    "India" 21645 "Ground Transport" .823381 1 0 -.19433625 0
    "India" 21646 "Ground Transport" .771547 1 0  -.2593577 0
    "India" 21647 "Ground Transport" .850581 1 0 -.16183564 0
    "India" 21648 "Ground Transport" .849756 1 0 -.16280603 0
    "India" 21649 "Ground Transport" .850895 1 0 -.16146654 0
    end
    format %tdNN/DD/CCYY date

    I have added a picture of the results as well:
    Click image for larger version

Name:	Screenshot 2023-08-24 at 6.07.46 PM.png
Views:	1
Size:	373.9 KB
ID:	1725183




    Here is the model I'm replicating: https://doi.org/10.46557/001c.32623




  • #2
    Your data design does not permit a full interaction of the Lockdown and Treatment variables.1b A full interaction would require dates representing all four combinations of Lockdown (0, 1) and Treatment (0, 1). But your data actually only contains three of the four possibilities. From 1 Jan 2019 through 29 Feb 2020 you have both Lockdown and Treatment = 0. From 1 Mar 2020 through 23 Mar 2020 you have Treatment = 1 and Lockdown = 0. Then from 24 Mar 2020 through 31 May 2021 you have Treatment = 1 and Lockdown = 1. Then from 1 Jun 2021 Lockdown reverts to 0, but Treatment remains 1. So, there is no time in the study when Lockdown = 1 and Treatment = 0. Without all four combinations represented you cannot get a full representation of the lockdown#treatment interaction. That is the source of the problems you are encountering.

    Now, you state in your post that you are getting a message "note: 0b.Lockdown#0b.Treatment identifies no observations in the sample." I cannot think of any reason for this: clearly all of the dates preceding 1 Mar 2020 should have both Lockdown and Treatment = 0. Moreover, the screenshot of the output you give shows no such message: the message about identifying no observations in the sample shown there refers to 1b.Lockdown#0b.Treatment--which is exactly what I have detailed in the first paragraph.

    Since the nature of the pandemic response was that there was no period with a lockdown but without the other treatment measures, it is simply not possible to estimate any effect of lockdown in the absence of treatment. I would revise my approach. Create a three-level variable, let's call it pandemic_response, set equal to 0 in those early days when there was neither lockdown nor any other measure, equal to 1 in those days where there were the non-lockdown restrictions only (both preceding and after the lockdown), and equal to 2 in the period where you had both those measures and the lockdown. Then you can regress ln_CO2gt against i.pandemic_response. That will make use of all of the information in your data.

    Added: This "revised" model will actually be an algebraic transform of the results you show in the screenshot. I recommend doing it simply because I think it is a better way to view the data. The screenshot results look like they provide an estimate of the independent effect of the lockdown, but that is misleading since the lockdown was never used independently of the others and what it shows is actually the incremental effect of the lockdown over the underlying effects of the other treatments. The model with a 3-level variable will be a more accurately descriptive approach to what happened.
    Last edited by Clyde Schechter; 27 Aug 2023, 14:16.

    Comment


    • #3
      Okay noted, Clyde. Thanks a lot for pointing this out. I wanted to know what answer this "revised" model would yield. Will it tell me the effect of all pandemic-related control measures i.e. both Lockdown and treatment? What if I want to eliminate the lockdown effect and just look at what effect the pandemic had?

      I hope I'm talking sense. Apologies for my lack of statistical understanding. You've been of great help already.

      Regards.

      Comment


      • #4
        Originally posted by Clyde Schechter View Post
        Your data design does not permit a full interaction of the Lockdown and Treatment variables.1b A full interaction would require dates representing all four combinations of Lockdown (0, 1) and Treatment (0, 1). But your data actually only contains three of the four possibilities. From 1 Jan 2019 through 29 Feb 2020 you have both Lockdown and Treatment = 0. From 1 Mar 2020 through 23 Mar 2020 you have Treatment = 1 and Lockdown = 0. Then from 24 Mar 2020 through 31 May 2021 you have Treatment = 1 and Lockdown = 1. Then from 1 Jun 2021 Lockdown reverts to 0, but Treatment remains 1. So, there is no time in the study when Lockdown = 1 and Treatment = 0. Without all four combinations represented you cannot get a full representation of the lockdown#treatment interaction. That is the source of the problems you are encountering.

        Now, you state in your post that you are getting a message "note: 0b.Lockdown#0b.Treatment identifies no observations in the sample." I cannot think of any reason for this: clearly all of the dates preceding 1 Mar 2020 should have both Lockdown and Treatment = 0. Moreover, the screenshot of the output you give shows no such message: the message about identifying no observations in the sample shown there refers to 1b.Lockdown#0b.Treatment--which is exactly what I have detailed in the first paragraph.

        Since the nature of the pandemic response was that there was no period with a lockdown but without the other treatment measures, it is simply not possible to estimate any effect of lockdown in the absence of treatment. I would revise my approach. Create a three-level variable, let's call it pandemic_response, set equal to 0 in those early days when there was neither lockdown nor any other measure, equal to 1 in those days where there were the non-lockdown restrictions only (both preceding and after the lockdown), and equal to 2 in the period where you had both those measures and the lockdown. Then you can regress ln_CO2gt against i.pandemic_response. That will make use of all of the information in your data.

        Added: This "revised" model will actually be an algebraic transform of the results you show in the screenshot. I recommend doing it simply because I think it is a better way to view the data. The screenshot results look like they provide an estimate of the independent effect of the lockdown, but that is misleading since the lockdown was never used independently of the others and what it shows is actually the incremental effect of the lockdown over the underlying effects of the other treatments. The model with a 3-level variable will be a more accurately descriptive approach to what happened.
        Hey Clyde, Thank you. The results for the revised model are the following:
        Please let me know if these are fine and if I can go ahead with interpreting them.

        Click image for larger version

Name:	Screenshot 2023-08-27 at 11.33.04 PM.png
Views:	1
Size:	266.9 KB
ID:	1725195

        Comment


        • #5
          Those results look fine and are ready for interpretation.

          So, your constant term is the expected value of ln_CO2gt when i.pandemic_response == 0, i.e. in the absence of any lockdown or other interventions. It's your baseline value.

          The coefficient of pandemic_response is the expected value of ln_CO2gt during the period when there were restrictions other than lockdown (both before and after the lockdown) but no lockdown. It appears that ln_CO2gt was about 0.10 lower during that period than it was when there were nor restrictions. Or, CO2gt itself was about 9% lower. (exp(-.0963757) = 0.90812277), and rounding to two decimal places means CO2gt was at about 91% of its value under no restrictions.

          Finally, when the lockdown and restrictions were both in place, ln_CO2gt was about 0.84 lower than it was before any restrictions were put into place. Since exp(-.8395869) = .4318889, it follows that CO2 was reduced to about 43% of its baseline value, a 57% reduction.

          It is not possible to get an independent estimate of the lockdown effect since it was never in place without concurrent other restrictions. But if you are interested in the incremental effect of the lockdown over other restrictions, you can get that by subtracting 2.pandemic_response - 1.pandemic_response. In order to get standard errors and confidence intervals around that, you can do that by running -lincom 2.pandemic_response-1.pandemic_response-. And then you can exponentiate that difference if you want to express the incremental effect of lockdown over other restrictions as a percentage reduction in CO2gt.

          The question is what to do next. The effects you are finding are large, but even so, there may be other variables that need to be added to this model. Perhaps there is seasonal variation in CO2gt, or other factors that might vary over time and also affect CO2gt. I don't know what those other factors might be--this is not my area. But presumably you do, or can get advice from colleagues in your discipline on this. Also along these lines, I wonder about spatial variation. And assuming your data is a time series, or perhaps panel data, there is the question of whether you can really consider the observations to all be independent--if they are closely spaced in time there may be autocorrelation issues to deal with. Again, whether these are real concerns for this question, and if so, how important they are, is beyond my knowledge. I'm just thinking in general statistical terms now.
          Last edited by Clyde Schechter; 27 Aug 2023, 17:45.

          Comment


          • #6
            Originally posted by Clyde Schechter View Post
            Those results look fine and are ready for interpretation.

            So, your constant term is the expected value of ln_CO2gt when i.pandemic_response == 0, i.e. in the absence of any lockdown or other interventions. It's your baseline value.

            The coefficient of pandemic_response is the expected value of ln_CO2gt during the period when there were restrictions other than lockdown (both before and after the lockdown) but no lockdown. It appears that ln_CO2gt was about 0.10 lower during that period than it was when there were nor restrictions. Or, CO2gt itself was about 9% lower. (exp(-.0963757) = 0.90812277), and rounding to two decimal places means CO2gt was at about 91% of its value under no restrictions.

            Finally, when the lockdown and restrictions were both in place, ln_CO2gt was about 0.84 lower than it was before any restrictions were put into place. Since exp(-.8395869) = .4318889, it follows that CO2 was reduced to about 43% of its baseline value, a 57% reduction.

            It is not possible to get an independent estimate of the lockdown effect since it was never in place without concurrent other restrictions. But if you are interested in the incremental effect of the lockdown over other restrictions, you can get that by subtracting 2.pandemic_response - 1.pandemic_response. In order to get standard errors and confidence intervals around that, you can do that by running -lincom 2.pandemic_response-1.pandemic_response-. And then you can exponentiate that difference if you want to express the incremental effect of lockdown over other restrictions as a percentage reduction in CO2gt.

            The question is what to do next. The effects you are finding are large, but even so, there may be other variables that need to be added to this model. Perhaps there is seasonal variation in CO2gt, or other factors that might vary over time and also affect CO2gt. I don't know what those other factors might be--this is not my area. But presumably you do, or can get advice from colleagues in your discipline on this. Also along these lines, I wonder about spatial variation. And assuming your data is a time series, or perhaps panel data, there is the question of whether you can really consider the observations to all be independent--if they are closely spaced in time there may be autocorrelation issues to deal with. Again, whether these are real concerns for this question, and if so, how important they are, is beyond my knowledge. I'm just thinking in general statistical terms now.
            Hello Clyde, thanks again. Following are the results for
            Code:
            lincom 2.pandemic_response-1.pandemic_response
            Request you to help me interpret these.

            Regards.
            Click image for larger version

Name:	Screenshot 2023-08-28 at 1.35.21 PM.png
Views:	1
Size:	48.6 KB
ID:	1725231

            Comment


            • #7
              The incremental effect of lockdown implemented on top of the other restrictions is a reduction in ln_CO2gt by 0.74. Equivalently, it is associated with a reduction of CO2gt to 48% of its value in the presence of the other restrictions alone. (exp(-.7432113 = 0.47558422)) Otherwise put, it is associated with a 52% reduction of CO2gt from its value in the presence of other restrictions alone.

              Comment


              • #8
                Originally posted by Clyde Schechter View Post
                The incremental effect of lockdown implemented on top of the other restrictions is a reduction in ln_CO2gt by 0.74. Equivalently, it is associated with a reduction of CO2gt to 48% of its value in the presence of the other restrictions alone. (exp(-.7432113 = 0.47558422)) Otherwise put, it is associated with a 52% reduction of CO2gt from its value in the presence of other restrictions alone.
                Thank you Clyde for all the help. It has been invaluable. I would like to acknowledge the same in my dissertation Acknowledgements.

                Regards.

                Comment

                Working...
                X