Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Convert unbalanced to balanced panel for difference-in-difference model

    I'm trying to run a difference-in-difference model on unbalanced panel data between 2010 to 2018. The shock happens in 2014. Observations are at the state-year unit of analysis. I have missing observations for some states in some years prior to 2014 and after 2014. However, I know for a fact that the dependent variable in those years for those states is 0. Should I convert the unbalanced panel to balanced with the observations for those missing state-year set to 0?



  • #2
    Well, if you are going to do a bare-bones DID analysis, looking only at the group (shocked vs unshocked) and time (pre and post) and their interaction, and if you really know that for the missing observations the outcome variable is 0, then, yes I would expand the data set to a balanced configuration. If you have good information, it makes sense to make use of it, not ignore it. I am not entirely clear on what your data look like, but most likely the -fillin- command will make your task simple.

    If, however, you plan to adjust the analysis for other variables, and you don't have information about those, then you will gain nothing by filling in the data set.

    Finally, it seems curious that your data set has missing observations for which you actually know the outcome. How did that happen? How do you know the outcome? Or, alternatively, why are those observations missing in the first place?

    Comment


    • #3
      Thanks, Clyde. I began with individual permit-level data applied for between 2010 to 2018. I aggregated the permits to the state-level to identify the number of permits applied for by each state in each year between 2010 to 2018. During this aggregation, states with no permit applications in a year are dropped. I can credibly assume here that the state did not apply for any permits in that year - that's how I know that the missing state-year observations should have a value of 0.

      Comment


      • #4
        Thanks for explaining. It makes perfect sense now.

        Comment


        • #5
          Dear Clyde,
          So, If I have unbalanced panel data and I am analyzing the firms’ financial statements and some of the values in these financial statements are missing in specific years, and I am using the generalized DID, do you think I need to convert the unbalanced data to balance data, I thought that the STATA will drop these observations!!

          Comment


          • #6
            In #5 you are picking up on what I said in the second paragraph of #2. If there are other variables to be included in the analysis for which data is missing, there is no point in expanding the data set to balance it because, as you note, Stata will just omit those observations anyway. My advice in #2 to balance the data set was specifically conditioned on there being no additional variables in the analysis. I'm sorry if I didn't make that clear enough in #2.

            Comment


            • #7
              Dear Prof. Clyde,

              Now, it is very clear to me. Thanks a lot for the explanation. Much appreciated.

              Comment

              Working...
              X