Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Diff-in-Diff at district level with a Pseudo Panel

    Dear all,

    I currently try to carry out a diff-in-diff analysis under the use of two cross-sectional datasets with individual observations that I want to combine to a pseudo-panel (two rounds of the DHS surveys).
    However, I am a bit confused concerning the adjustments I have to do in STATA to have a panel dataset.

    The cohorts into which I want to group my individual observations is the district in the country of analysis (Ethiopia), such that I will evaluate differences at district-level. For a pseudo-panel, I would have to take the averages within each district and compare the averages between the same districts across the two survey rounds. Nevertheless, I would still like to keep the individual observations in my data without collapsing the data at district level.

    So far, I only appended the two cross-sections into one dataset. In the appended dataset, I have observations for districts that appear in either of the two rounds and that appear in both rounds. Of course, I can only make use of the observations that lie in districts that are sampled in both rounds.
    However, I was wondering how I can make sure that only the observations of the same district are being compared across the two rounds? Do I first have to match same districts across the two rounds? Is this done e.g. via "xtset district", to define the data as panel data at district level?
    Or, does STATA already incorporate that if I run a simple DID regression, such as:
    Code:
    reg y post Treatment post*Treatment, cluster(district)
    Thanks a lot in advance!
    Guest
    Last edited by sladmin; 10 Jun 2021, 06:36. Reason: anonymize original poster

  • #2
    Since there are exactly two years in your data set, you can eliminate those districts that only participated in a single year as follows:
    Code:
    assert !missing(year)
    by district (year), sort: keep if year[1] != year[_N]
    At that point you should
    Code:
    xtset district
    xtreg y i.post##i.Treatment, vce(cluster district)
    The coefficient of 1.post#1.Treatment will be the DID estimator of the causal effect of treatment.

    Comment


    • #3
      Dear Clyde,

      thanks a lot for your answer, this was very helpful.

      However, at this point another question came up to my mind:
      I have to use weights in my analysis (pweights), which seems not to be possible with xtreg.
      In some online sources (e.g. Microsoft PowerPoint - Panel101.ppt [Compatibility Mode] (princeton.edu)) it was mentioned that one can use "xtreg", "areg" or "xi: reg" for panel fixed effect models, such that the following commands should yield the same or very similar results as the "xtreg" procedure mentioned above:

      Either:
      Code:
      areg y i.post##i.Treatment, cluster(district) absorb(district)
      Or:
      Code:
      xi: reg y i.post##i.Treatment i.district, cluster(district)
      This would (probably) allow me to include the weights in my regressions. However, first examples in STATA show in my case that results from these commands are similar but still different from the xtreg command. Did I do something wrong in the commands or are those not exactly equivalent to the use of xtreg?

      Finally, I was also a bit confused about the district fixed effects. I have assumed that they are included by the dummy "district" instead of "treatment", but I guess this leads to the same in this case?

      Thanks in advance!




      Comment


      • #4
        The -xi:- prefix is almost completely obsolete, having been superseded by factor variable notation. Moreover, your use of it will interfere with your ability to use -margins-. So kill the -xi:- in the -reg- command and go with that. If you do that, it will give you, possibly with tiny rounding errors, the same results you would get with -xtreg, fe-, except that -reg- will give you a long list of coefficients for the districts (which, by the way, should be ignored). -areg- is not exactly equivalent to -xtreg, fe-, but the differences are slight and rarely of any importance. See https://www.stata.com/statalist/arch.../msg00596.html if you want details about that.

        Finally, I was also a bit confused about the district fixed effects. I have assumed that they are included by the dummy "district" instead of "treatment", but I guess this leads to the same in this case?
        I don't understand what you are saying/asking here. In a -regress- command, the district fixed effects are represented by i.district in the variable list. In -xtreg, fe- they go unmentioned, because the -fe- part causes Stata to look at your -xtset- command and use whatever you set as the panel variable (which, if you followed my earlier code, was district).

        Comment


        • #5
          Thanks again, and sorry, my last question was not well put. It was not directly clear for me where district fixed effects are reflected in the respective equations, which you answered now.
          Code:
          reg y i.treatment##i.post i.district, cluster(district)
          So, as you confirmed, in the above equation time fixed effects are represented by "i.post" and district fixed effects by "i.district". And just to be sure: "i.district" then guarantees that I only compare observations of the same districts over time (which are the cohort in my pseudo panel), such that in principle it has the same function as "xtset district" in the xtreg variant?

          And one last question: In another thread (Fixed effect difference-in-differences model - Statalist), it was implied that the Treatment-dummy should usually be dropped when including entity fixed effects, due to collinearity. When I run the regression above, the treatment-dummy is not dropped, but one district is omitted because of collinearity (in STATA: "note: district424 omitted because of collinearity"). When I use "areg", the treatment-dummy is only omitted when I do not weigh.

          Is this something I should worry about, or seems everything to be alright?

          Thanks in advance!

          Comment


          • #6
            And just to be sure: "i.district" then guarantees that I only compare observations of the same districts over time (which are the cohort in my pseudo panel), such that in principle it has the same function as "xtset district" in the xtreg variant?
            Correct.

            When I run the regression above, the treatment-dummy is not dropped, but one district is omitted because of collinearity (in STATA: "note: district424 omitted because of collinearity"). When I use "areg", the treatment-dummy is only omitted when I do not weigh.

            Is this something I should worry about, or seems everything to be alright?
            This is nothing to worry about. In this kind of data, there is an automatic colinearity among the treatment variable and the district-effects. That colinearity must be broken in order for the regression estimation to proceed. It can be broken by the omission of any one of the variables involved in it. So removing the uninteracted treatment variable or any one of the district indicators does the trick. -reg- does the latter, -xtreg, fe- does the former. But these are completely equivalent in terms of everything that matters to your analysis. The two models are just algebraic transforms of each other, and the models' predictions are identical (except possibly for minimal rounding errors). In particular the effect estimate, the coefficient of the treatment#post interaction term, will always be the same either way.

            I have rarely used -areg- myself, and I can't comment on why it deals with colinearity differently in the presence or absence of weights. I have sufficient faith in -areg-, both because Stata is a high quality software product developed by a high quality organization and because -areg- is a very old command so there has been plenty of time to correct any errors that would arise with the kind of frequency that this kind of error would, to believe that it is handling things correctly regardless of weighting. But I can't personally affirm that from my own experience with it.

            By the way, there is yet another approach you can take here. -reghdfe-, by Sergio Correa, also performs fixed effects regression and supports pweights. I believe it comes from SSC.

            Comment


            • #7
              Great, thanks a lot for your help!

              Comment


              • #8
                Dear Clyde,

                thanks again for your help!

                Since there are exactly two years in your data set, you can eliminate those districts that only participated in a single year as follows:
                assert !missing(year) by district (year), sort: keep if year[1] != year[_N]
                Do you know if there exists a useful code if I have 3 different years in my panel and I only want to keep the districts that appear in each year, by any chance?

                Thanks in advance!

                Comment


                • #9
                  Code:
                  by district (year), sort: gen ycount = sum(year != year[_n-1])
                  keep if ycount == 3

                  Comment


                  • #10
                    It works, although in my case I would use:
                    Code:
                    by district(ycount), sort: keep if ycount[_N] == 3
                    in the second line, to keep all observations from each year for the districts that appear every year.

                    Thank you!

                    Comment


                    • #11
                      Yes, your modification of my code is correct. It's what I should have written in the first place. Sorry for the error. Thank you for correcting it.

                      Comment


                      • #12
                        hi,

                        first of all, thank you for this very informative thread.

                        my question is regarding the use of gologit2 instead of reg (y is ordered) - would then

                        Code:
                        gologit2 y i.treatment##i.post i.district, cluster(district)
                        ensure that I only compare observations of the same districts over time (which are the cohort in my pseudo panel)?

                        To my understanding gologit2 and xtologit will not return similar results.

                        many thanks,
                        A
                        Last edited by aneta zahei; 28 Jun 2023, 17:25.

                        Comment

                        Working...
                        X