Regression Code

Guest
#1

Regression Code

05 Apr 2019, 13:17

Dear Statalists,

I am trying to implement difference-to-difference estimation as a regression controlling for age and sex.
My dependent variable is unemp (unemployment), there are 5 cities (Berlin, Munich, Hamburg, Cologne and Frankfurt) and 15 years (2000~2015).
I want to run the regression for European only.

The story of this data-set is to see the effect of immigration influx into Berlin in 2010 on unemployment using the DD model.
The effects of the influx would only be seen from 2011 onwards.

Can you check if my command and regression model (with explanation) are appropriate please?

The following is the regression model with an explanation I wrote.

Y_cjt = 𝛾_c + 𝜆_t + δD_ct+ 𝑋_c_jt𝛽 + 𝜀_cjt

c is cities, t is years, j is gender and age.
Y is unemployment
𝛾_cis a dummy for each city
𝜆_tis a dummy for each year
𝑋_c_jt includes dummies for age and gender.
D_{ct, is regressor of interest which indicates observations for people in Berlin from 2011 onwards (after the immigration inflow)}

The following is my code.

keep if race=="European"
gen Treat=0
replace Treat=1 if city=="Berlin"
gen Post=0
replace Post=1 if year>=2011
gen TreatPost=Treat*Post
xi: reg unemp i.city i.year TreatPost sex age, cluster(city)

Last edited by sladmin; 06 May 2019, 12:30. Reason: anonymize data
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30065
#2

05 Apr 2019, 18:11

The code you suggest is for the generalized difference-in-differences estimator. As you have only treatment city, the treatment begins at the same time in all the treatment cities. So you can use the simple classical difference in differences estimator instead.

There are also a number of other ways in which your code can be simplified, especially eliminating the obsolete -xi:- prefix.

Code:

keep if race == "European" gen post = (year >= 2011) gen treat = (city == "Berlin") encode city, gen(n_city) xtset n_city xtreg unemp i.treat##i.post i.sex age, fe vce(cluster n_city)

That said, in your example data, there are no pre-2011 observations for Berlin. If that is true of your data as a whole, then you do not have a data set that will support a DID estimation. You must have both pre- and post-2011 observations for both the treatment group and the other group. This is as true of your original code as it is of my simplification.

Last edited by sladmin; 06 May 2019, 12:31. Reason: anonymize original data
Comment
Guest
#3

06 Apr 2019, 03:38

Thank you very much for your reply.
Why do I have to put 'i.' in front of 'sex', but not in front of 'age'?

As far as I know, there are many classes in treat (5 cities), post (19 years) and that's why we put 'i.' in front of them
Age also has many classes unlike sex which has only two classes (either 1 or 2)

Also, there are pre-2011 observations for Berlin...

Last edited by sladmin; 06 May 2019, 12:38. Reason: anonymize original data
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17702

06 Apr 2019, 08:13

Guest:
you do not have to put -i.- before -age- because -age- is a continuos variable (and you do not seem to have age classes, at laest as far as I can read from your screenshot) and Stata considers predictors continuos by default, as you can see in the following toy-example, focused on a categorical variable without and with -i-.:

Code:

. sysuse auto.dta
(1978 Automobile Data)

. reg price rep78

      Source |       SS           df       MS      Number of obs   =        69
-------------+----------------------------------   F(1, 67)        =      0.00
       Model |  24770.7652         1  24770.7652   Prob > F        =    0.9574
    Residual |   576772188        67  8608540.12   R-squared       =    0.0000
-------------+----------------------------------   Adj R-squared   =   -0.0149
       Total |   576796959        68  8482308.22   Root MSE        =      2934

------------------------------------------------------------------------------
       price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       rep78 |   19.28012   359.4221     0.05   0.957    -698.1295    736.6897
       _cons |   6080.379    1274.06     4.77   0.000     3537.345    8623.413
------------------------------------------------------------------------------

. reg price i.rep78

      Source |       SS           df       MS      Number of obs   =        69
-------------+----------------------------------   F(4, 64)        =      0.24
       Model |  8360542.63         4  2090135.66   Prob > F        =    0.9174
    Residual |   568436416        64     8881819   R-squared       =    0.0145
-------------+----------------------------------   Adj R-squared   =   -0.0471
       Total |   576796959        68  8482308.22   Root MSE        =    2980.2

------------------------------------------------------------------------------
       price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       rep78 |
          2  |   1403.125   2356.085     0.60   0.554    -3303.696    6109.946
          3  |   1864.733   2176.458     0.86   0.395    -2483.242    6212.708
          4  |       1507   2221.338     0.68   0.500    -2930.633    5944.633
          5  |     1348.5   2290.927     0.59   0.558    -3228.153    5925.153
             |
       _cons |     4564.5   2107.347     2.17   0.034     354.5913    8774.409
------------------------------------------------------------------------------

Last edited by sladmin; 08 Apr 2019, 09:14. Reason: anonymize original poster

Kind regards,
Carlo
(Stata 19.0)

Comment

Guest
#5

06 Apr 2019, 08:28

Thank you for your reply!

In this panel regression, I want to put a city-fixed dummy variable to control for fixed differences between cities.
Should the city fixed dummy be "i.city" or "i.treat"?

keep if race=="European"
gen Treat=0
replace Treat=1 if city=="Berlin"
gen Post=0
replace Post=1 if year>=2011
gen TreatPost=Treat*Post

xi: reg unemp i.city i.year TreatPost sex age, cluster(city)
or
xi: reg unemp i.treat i.year TreatPost sex age, cluster(treat)

which one?

Last edited by sladmin; 06 May 2019, 12:41. Reason: anonymize original data
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17702
#6

06 Apr 2019, 09:09

Guest:
I would say -i.Treat-.
That said:
- I still do not understand why, with panel data, you decided to go pooled OLS instead of -xtreg- (as an aside, whichever command you will choose, please note that -xi.- prefix is redundant);
- what's the reward from creating interactions yourself when -fvvarlist- can do them for you?

Last edited by sladmin; 08 Apr 2019, 09:14. Reason: anonymize original poster

Kind regards,
Carlo
(Stata 19.0)
Comment
Guest
#7

07 Apr 2019, 03:19

Dear Carlo,

Thank you for your reply.

Indeed, I tried xtreg as well, which gave me "repeated time values within panel error". It seems like I have to drop something. But, I don't know why and which one.

My command is

egen tvar=group(year)
egen svar=group(city)
tsset svar tvar

repeated time values within panel

xtreg unemp i.year TreatPost sex age, fe cluster(svar)

---------------------

My other question is why is my cross section (svar) 'treat'? not 'city'?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17702
#8

07 Apr 2019, 03:29

Guest:
if you do not have genuine duplicates (ie, erroneous data entries) and do not plan to use time-series commands such as lags and leads,, you can safely -xtset- your data with -panelid- ony.
as far as I can read from your screenshot, -city- cannot be used as a predictor in your regression model unless you convert it from -string- to numeric format.

Last edited by sladmin; 08 Apr 2019, 09:14. Reason: anonymize original poster

Kind regards,
Carlo
(Stata 19.0)
Comment
Guest
#9

07 Apr 2019, 09:05

Thank you for your reply.

So, I converted it to numeric
encode city, gen(numerical_city)
xtset numerical_city

I'm having another issue here.
I'm not sure which code to use from the following options. They give different DD estimates.
I need to include either i.year or i.Post as a year-fixed effect.

xtreg unemp i.year TreatPost sex age, fe vce(cluster numerical_city)
xtreg unemp i.Post TreatPost sex age, fe vce(cluster numerical_city)
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17702
#10

07 Apr 2019, 09:20

Guest:
why not using the helpful code suggested by Clyde at #2?

Last edited by sladmin; 08 Apr 2019, 09:14. Reason: anonymize original poster

Kind regards,
Carlo
(Stata 19.0)
Comment
Guest
#11

07 Apr 2019, 09:32

My final code is exactly the same as the one suggested by Clyde at #2 except for the i.Post (I was wondering why it would give different results if I use i.year instead of i.Post).

I will rephrase my last question.
Clyde suggested the following code.
xtreg unemp i.treat##i.post i.sex age, fe vce(cluster n_city)

But, I was wondering if I could replace the 'i.post' to i.year' like this: xtreg unemp i.treat##i.year i.sex age, fe vce(cluster n_city)
I would like to know why the two codes give different DD coefficients.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17702
#12

07 Apr 2019, 09:53

Guest:
the syntax of your code is perfectly legal: hence, it will work.
As why the two codes give different results, I do not know.

Last edited by sladmin; 08 Apr 2019, 09:14. Reason: anonymize original poster

Kind regards,
Carlo
(Stata 19.0)
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30065
#13

07 Apr 2019, 11:33

I can't so much explain why the two models give different results as ask why you think they should give the same results. They are different models. One of them adjusts for every yearly shock, the other does not. Why do you expect the results to be the same?
Comment
Guest
#14

07 Apr 2019, 13:30

So which one adjusts for every yearly shock? I would like to have a 'year-fixed effect' dummy for each year
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30065
#15

07 Apr 2019, 14:39

The one with i.year adjusts for yearly shocks.
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment