Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Regression Code

    Dear Statalists,

    I am trying to implement difference-to-difference estimation as a regression controlling for age and sex.
    My dependent variable is unemp (unemployment), there are 5 cities (Berlin, Munich, Hamburg, Cologne and Frankfurt) and 15 years (2000~2015).
    I want to run the regression for European only.

    The story of this data-set is to see the effect of immigration influx into Berlin in 2010 on unemployment using the DD model.
    The effects of the influx would only be seen from 2011 onwards.

    Can you check if my command and regression model (with explanation) are appropriate please?


    The following is the regression model with an explanation I wrote.

    Ycjt = ð›ūc + 𝜆t + ÎīDct + 𝑋cjtð›― + 𝜀cjt

    c is cities, t is years, j is gender and age.
    Y is unemployment
    ð›ūc is a dummy for each city
    𝜆t is a dummy for each year
    𝑋cjt includes dummies for age and gender.
    Dct, is regressor of interest which indicates observations for people in Berlin from 2011 onwards (after the immigration inflow)



    The following is my code.

    keep if race=="European"
    gen Treat=0
    replace Treat=1 if city=="Berlin"
    gen Post=0
    replace Post=1 if year>=2011
    gen TreatPost=Treat*Post
    xi: reg unemp i.city i.year TreatPost sex age, cluster(city)
    Last edited by sladmin; 06 May 2019, 12:30. Reason: anonymize data

  • #2
    The code you suggest is for the generalized difference-in-differences estimator. As you have only treatment city, the treatment begins at the same time in all the treatment cities. So you can use the simple classical difference in differences estimator instead.

    There are also a number of other ways in which your code can be simplified, especially eliminating the obsolete -xi:- prefix.

    Code:
    keep if race == "European"
    gen post = (year >= 2011)
    gen treat = (city == "Berlin")
    encode city, gen(n_city)
    xtset n_city
    xtreg unemp i.treat##i.post i.sex age, fe vce(cluster n_city)
    That said, in your example data, there are no pre-2011 observations for Berlin. If that is true of your data as a whole, then you do not have a data set that will support a DID estimation. You must have both pre- and post-2011 observations for both the treatment group and the other group. This is as true of your original code as it is of my simplification.
    Last edited by sladmin; 06 May 2019, 12:31. Reason: anonymize original data

    Comment


    • #3
      Thank you very much for your reply.
      Why do I have to put 'i.' in front of 'sex', but not in front of 'age'?

      As far as I know, there are many classes in treat (5 cities), post (19 years) and that's why we put 'i.' in front of them
      Age also has many classes unlike sex which has only two classes (either 1 or 2)



      Also, there are pre-2011 observations for Berlin...
      Last edited by sladmin; 06 May 2019, 12:38. Reason: anonymize original data

      Comment


      • #4
        Guest:
        you do not have to put -i.- before -age- because -age- is a continuos variable (and you do not seem to have age classes, at laest as far as I can read from your screenshot) and Stata considers predictors continuos by default, as you can see in the following toy-example, focused on a categorical variable without and with -i-.:
        Code:
        . sysuse auto.dta
        (1978 Automobile Data)
        
        . reg price rep78
        
              Source |       SS           df       MS      Number of obs   =        69
        -------------+----------------------------------   F(1, 67)        =      0.00
               Model |  24770.7652         1  24770.7652   Prob > F        =    0.9574
            Residual |   576772188        67  8608540.12   R-squared       =    0.0000
        -------------+----------------------------------   Adj R-squared   =   -0.0149
               Total |   576796959        68  8482308.22   Root MSE        =      2934
        
        ------------------------------------------------------------------------------
               price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
               rep78 |   19.28012   359.4221     0.05   0.957    -698.1295    736.6897
               _cons |   6080.379    1274.06     4.77   0.000     3537.345    8623.413
        ------------------------------------------------------------------------------
        
        . reg price i.rep78
        
              Source |       SS           df       MS      Number of obs   =        69
        -------------+----------------------------------   F(4, 64)        =      0.24
               Model |  8360542.63         4  2090135.66   Prob > F        =    0.9174
            Residual |   568436416        64     8881819   R-squared       =    0.0145
        -------------+----------------------------------   Adj R-squared   =   -0.0471
               Total |   576796959        68  8482308.22   Root MSE        =    2980.2
        
        ------------------------------------------------------------------------------
               price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
               rep78 |
                  2  |   1403.125   2356.085     0.60   0.554    -3303.696    6109.946
                  3  |   1864.733   2176.458     0.86   0.395    -2483.242    6212.708
                  4  |       1507   2221.338     0.68   0.500    -2930.633    5944.633
                  5  |     1348.5   2290.927     0.59   0.558    -3228.153    5925.153
                     |
               _cons |     4564.5   2107.347     2.17   0.034     354.5913    8774.409
        ------------------------------------------------------------------------------
        Last edited by sladmin; 08 Apr 2019, 09:14. Reason: anonymize original poster
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Thank you for your reply!

          In this panel regression, I want to put a city-fixed dummy variable to control for fixed differences between cities.
          Should the city fixed dummy be "i.city" or "i.treat"?




          keep if race=="European"
          gen Treat=0
          replace Treat=1 if city=="Berlin"
          gen Post=0
          replace Post=1 if year>=2011
          gen TreatPost=Treat*Post

          xi: reg unemp i.city i.year TreatPost sex age, cluster(city)
          or
          xi: reg unemp i.treat i.year TreatPost sex age, cluster(treat)

          which one?
          Last edited by sladmin; 06 May 2019, 12:41. Reason: anonymize original data

          Comment


          • #6
            Guest:
            I would say -i.Treat-.
            That said:
            - I still do not understand why, with panel data, you decided to go pooled OLS instead of -xtreg- (as an aside, whichever command you will choose, please note that -xi.- prefix is redundant);
            - what's the reward from creating interactions yourself when -fvvarlist- can do them for you?
            Last edited by sladmin; 08 Apr 2019, 09:14. Reason: anonymize original poster
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment


            • #7
              Dear Carlo,

              Thank you for your reply.

              Indeed, I tried xtreg as well, which gave me "repeated time values within panel error". It seems like I have to drop something. But, I don't know why and which one.


              My command is



              egen tvar=group(year)
              egen svar=group(city)
              tsset svar tvar

              repeated time values within panel

              xtreg unemp i.year TreatPost sex age, fe cluster(svar)

              ---------------------

              My other question is why is my cross section (svar) 'treat'? not 'city'?

              Comment


              • #8
                Guest:
                if you do not have genuine duplicates (ie, erroneous data entries) and do not plan to use time-series commands such as lags and leads,, you can safely -xtset- your data with -panelid- ony.
                as far as I can read from your screenshot, -city- cannot be used as a predictor in your regression model unless you convert it from -string- to numeric format.
                Last edited by sladmin; 08 Apr 2019, 09:14. Reason: anonymize original poster
                Kind regards,
                Carlo
                (Stata 19.0)

                Comment


                • #9
                  Thank you for your reply.

                  So, I converted it to numeric
                  encode city, gen(numerical_city)
                  xtset numerical_city

                  I'm having another issue here.
                  I'm not sure which code to use from the following options. They give different DD estimates.
                  I need to include either i.year or i.Post as a year-fixed effect.


                  xtreg unemp i.year TreatPost sex age, fe vce(cluster numerical_city)
                  xtreg unemp i.Post TreatPost sex age, fe vce(cluster numerical_city)

                  Comment


                  • #10
                    Guest:
                    why not using the helpful code suggested by Clyde at #2?
                    Last edited by sladmin; 08 Apr 2019, 09:14. Reason: anonymize original poster
                    Kind regards,
                    Carlo
                    (Stata 19.0)

                    Comment


                    • #11
                      My final code is exactly the same as the one suggested by Clyde at #2 except for the i.Post (I was wondering why it would give different results if I use i.year instead of i.Post).

                      I will rephrase my last question.
                      Clyde suggested the following code.
                      xtreg unemp i.treat##i.post i.sex age, fe vce(cluster n_city)

                      But, I was wondering if I could replace the 'i.post' to i.year' like this: xtreg unemp i.treat##i.year i.sex age, fe vce(cluster n_city)
                      I would like to know why the two codes give different DD coefficients.

                      Comment


                      • #12
                        Guest:
                        the syntax of your code is perfectly legal: hence, it will work.
                        As why the two codes give different results, I do not know.
                        Last edited by sladmin; 08 Apr 2019, 09:14. Reason: anonymize original poster
                        Kind regards,
                        Carlo
                        (Stata 19.0)

                        Comment


                        • #13
                          I can't so much explain why the two models give different results as ask why you think they should give the same results. They are different models. One of them adjusts for every yearly shock, the other does not. Why do you expect the results to be the same?

                          Comment


                          • #14
                            So which one adjusts for every yearly shock? I would like to have a 'year-fixed effect' dummy for each year

                            Comment


                            • #15
                              The one with i.year adjusts for yearly shocks.

                              Comment

                              Working...
                              X