First difference regression and fixed effect regression

Lucca Mancini

Join Date: Mar 2019

Posts: 27
#1

First difference regression and fixed effect regression

11 Jul 2019, 02:01

Hello
I try to make a regression, with panel data over a period of eight years, in which I will investigate the relationship between crime and migration. My regression equation is as follows where crime is the number of registered violent crimes (dependent variable). Migrants are the number of asylum seekers (independent variable, main variable of interest) and four possible control variables represented by X in a region i in year t are used.

My questions are:
1) I have seen in papers that make a fixed or first difference regression often use e.g. year dummies and/or region dummies. In other papers they are given in the tables as fixed effect instead of dummies. What is the difference between e.g. a year dummy and a year fixed effect?

2) How important is it to include in such an estimation e.g. year dummies? or should one include year dummies and also region dummies?
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17711
#2

11 Jul 2019, 02:55

Lucca:
1) there's no practical difference between year dummy and year fixed effect (however, in panel data regression the groupwise effect you investigate is the one of -panelid-). That said, in Stata year dummies creation has been superseded by -fvvarlist- notation.
2) you can include both -i.year- and -i.country-. After regression you can test their joint statistical significance via -testparm-.

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment

Lucca Mancini

Join Date: Mar 2019
Posts: 27

11 Jul 2019, 03:19

Thank you very much Carlo. Now if I do a first difference regression only with year dummies, it works, but as soon as I want to use i.region for region dummies, I get an error message ("Region: string variables may not be used as factor variables"). Do you know if I have an error in my codes? (Regression 1 works but regression 2 does not work)

Code:

***** Preparing Data
sort Region Year
egen panel_id = group(Region)
sort panel_id Year
tsset panel_id Year

gen adult_pop = Pop-Pop_0_14
gen asylum_pop = (asylum/adult_pop)
gen a8_pop = (EUcum/adult_pop)
gen young_share = (Pop_15_24/adult_pop)
gen benefit_claimants = Benefit/adult_pop
gen lnpop = ln(adult_pop)
gen a8_iv = EU_8_IV/adult_pop
gen viol_crime_rate = Violence/adult_pop

sort panel_id Year
by panel_id: egen avg_adult_pop = mean(adult_pop)
gen trend = Year-2010


****First-Difference Regression
***1)
regress D.(viol_crime_rate asylum_pop a8_pop lnpop benefit_claimants young_share) i.Year [aw=adult_pop], vce(cluster panel_id)
test D.asylum_pop=D.a8_pop


***2)
regress D.(viol_crime_rate asylum_pop a8_pop lnpop benefit_claimants young_share) i.Year i.Region [aw=adult_pop], vce(cluster panel_id)
test D.asylum_pop=D.a8_pop

Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17711

11 Jul 2019, 03:23

Lucca:
you cannot have a variable in -string- format as a predictor: you should -destring- it first, as you can see from the followiing toy-example:

Code:

. set obs 1
number of observations (_N) was 0, now 1

. g Region="1" in 1

. list

     +--------+
     | Region |
     |--------|
  1. |      1 |
     +--------+

. destring Region, replace
Region: all characters numeric; replaced as byte

. list

     +--------+
     | Region |
     |--------|
  1. |      1 |
     +--------+
.

Kind regards,
Carlo
(Stata 19.0)

Comment

Wouter Wakker

Join Date: Nov 2018

Posts: 621
#5

11 Jul 2019, 04:14

Besides Carlo's helpful advise, regional dummies should not be included in a first difference regression anyway because differencing has the same effect as demeaning, it already gets rid of the regional fixed effects if you have set region as your panel id.
1 like
Comment
Eric de Souza

Join Date: Mar 2014

Posts: 587
#6

11 Jul 2019, 05:35

And as Professor Jeffrey Wooldridge keeps on repeating, you distinguish between the model and the technique used to estimate it.
The model contains the regional dummies. When you take the first difference to estimate the model you get one minus one or zero minus zero.
This is making the same point as Wouter Wakker in a different way.
1 like
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17711
#7

11 Jul 2019, 05:42

Wouter and Eric are obvioiusly correct.
I forgot to mention in my previous reply that each time-invariant predictor will be wiped out by the -fe- machinery.
Thinking about -xtreg, fe-, things might be different if some panel unit changes region during the 8-year timespan (ie, -region- is no more time-invariant).

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
Lucca Mancini

Join Date: Mar 2019

Posts: 27
#8

11 Jul 2019, 07:21

Thank you so much for your help. Exactly I chose region as panel_id.

If I understand it correctly, if I perform the following panel regression in first differences and only include in the codes "i.Year", I have year fixed-effects in the regression and for the region I don't have to write "i.Region" in the codes anymore, because I already chose region as panel_id and so region fixed-effects is directly included.
Am I right?

Code:

******* Panel Settings sort Kanton Year egen panel_id = group(Region) sort panel_id Year tsset panel_id Year *** First Difference Regression regress D.(viol_crime_rate asylum_pop a8_pop lnpop benefit_claimants young_share) i.Year [aw=adult_pop], robust

Last edited by Lucca Mancini; 11 Jul 2019, 07:24.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17711
#9

11 Jul 2019, 08:47

Correct.
However, I do not think that you should -xtset- or -tsset- your data before running First Difference Regression.

Last edited by Carlo Lazzaro; 11 Jul 2019, 09:34.

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
Wouter Wakker

Join Date: Nov 2018

Posts: 621
#10

11 Jul 2019, 09:19

Carlo: D. is a time series operator so will not work without tsset or xtset. Also from a technical point of view, a first difference regression uses the differences of the variables within each panel, so if you have 15 observations per panel in levels, you will have 14 per panel in FD. In other words, to calculate these differences within panels Stata needs to know the time and panelvar.

Lucca: Your code looks alright although I'm not familiar with weight regressions so I cannot comment on the weights. Also, be aware that robust after -reg- is not the same as robust after -xtreg, fe-.
1 like
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17711
#11

11 Jul 2019, 09:35

Wouter is correct.
Admittedly, I seldom use time-series operator so I've forgotten that -tsset-ing data beforehand is mandatory to make them work.
About the difference between -robust- option in -regress- and -xtreg-, I recall a really interesting thread led by some points raised by daniel klein https://www.statalist.org/forums/for...ls-assumptions

Last edited by Carlo Lazzaro; 11 Jul 2019, 09:43.

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
Lucca Mancini

Join Date: Mar 2019

Posts: 27
#12

11 Jul 2019, 11:09

Thank you so much for your helpful replies. I have now noticed regarding the robust standard errors that I could also do vce(cluster Region), which means that standard errors are clustered at regional level. If I calculate with "robust", then the standard error is bigger than with vce(cluster Region). But I don't know which one is better or what does vce(cluster Region) exactly mean and in which cases one should use it?

Code:

regress D.(viol_crime_rate asylum_pop a8_pop lnpop benefit_claimants young_share) i.Year [aw=adult_pop], robust **** or regress D.(viol_crime_rate asylum_pop a8_pop lnpop benefit_claimants young_share) i.Year [aw=adult_pop], vce(cluster Region)
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17711
#13

11 Jul 2019, 11:11

Lucca:
you should go as you did (-vce(cluster Region)-).

Kind regards,
Carlo
(Stata 19.0)
Comment

Announcement