Panel data - how to analyse regional differences (over time)?

Chiara De Siena

Join Date: Oct 2020

Posts: 28
#1

Panel data - how to analyse regional differences (over time)?

15 Oct 2020, 05:48

Hi all, I am new here and I wish to thank you for your help during these years of university.

I am considering of using pane data for my master thesis, but I wish to focus on regional difference in Italy. I am using EUSILC data: only individual personal data, not household information.
In the dataset, the variable for region is nuts 2, that is, I have macroregions (north east, north west, south, centre). My research questions are the following:
Does education influence the likelihood of getting an open-ended job? If so, how does it changes over time and across the Italian regions? Which other factors can have an effect on getting an open-ended job?

Besides EUSILC data, I will also include some (max 2) variables at the regional level.
Here is my question: how should I consider regions in a panel analysis? Should I create dummy variables or can I leave the variable region as it is (4 categories)?

Any comments and documents about the topic are very welcomed!
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17678
#2

15 Oct 2020, 06:00

Chiara:
welcome to this forum.
Most of your questions relate to your research field, not to statistics or Stata (eg, Which other factors can have an effect on getting an open-ended job?).
That said, you're probably interested in -xtlogit- if you actually have panel data with a two-level categorical regressand (open ended-job yes/no).
Probably using the macroregions is fine (otherwise for some Italian regions the data would be pretty poor).
Anyway, you can add categorical predictors concerning both Italian macroregions and time via the -fvvarlist- notation:

Code:

xtlogit <depvar> <otherpredictors> i.macroregions i.time, fe

Please note that -xtlogit- has different options: re; conditional fe and pa that you can read in detail in -xtlogit helpfile- and related entry in Stata .pdf manual.

Kind regards,
Carlo
(Stata 19.0)
Comment
Chiara De Siena

Join Date: Oct 2020

Posts: 28
#3

15 Oct 2020, 06:50

Thank you for your comment! From what you wrote, I understand that the regional variable should only be considered as a simple control variable, isn't it? If so, great!

As regards my questions not reated with statistics, I try to be more specific since I 've already defined two regional variables which could influence my dependent variable.
I can download them as both macroregion and real regional data.
Specifically, one is the net turn over of enterprises and it is expressed as a percentage, while the other one is R&D expenses per inhabitant. For this latter, I am thinking of calculating a weighted average to consider the different size of regions. I will do so by using the population in a region and then dividing the weighted sum for the total of the Italian population. Could it be a viable choice?

In other words, I will use regional data for all 20 regions and then calculate the new R&D expenses per macroregion like this:

- X (R&D expenses per inhabitant, euro/inhabitant - original variable)
- a total inhabitant of a region
- b total Italian population

the new X for the macroregion of centre Italy (and, say, 2006) will be (X _lazio* a₁ + X _marche* a₂ + X _toscana* a₃+ X _umbria* a₄) / b

In relation to the method, I was thinking of using either xtlogit or xtprobit, is the second one also a good option according to you?

Hope to have been enough clear.
Kind regards,

Chiara
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17678
#4

15 Oct 2020, 07:03

Chiara:
using regional instead of macroregional data may give you negligible results for some regions.
The fact that the (macro)regional variable is a predictor or a control does not change the way it sholuld be coded up in Stata, though.
Instead of creating a new variable, can't you simply interact them?

Code:

c.RDExp##i.macroregion

Before interacting, you may want to consider centering R&D expenditure at a meaningful value (ed, its mean).
Despite being similar to -xtlogit- (by the way: I would go -xtlogit-), -xtprobit- does not offer the (conditional) -fe- specification.

Kind regards,
Carlo
(Stata 19.0)
Comment
Chiara De Siena

Join Date: Oct 2020

Posts: 28
#5

15 Oct 2020, 07:20

For the analysis I will use macroregions.

I am talking about regions because I think that the size of population in a macroregion will affect the information on R&D. In fact, if I have the same RED exp, but very different population size, the data for a region where the population is more numerous will be lower than in a region where the population is lower. That's why I am considering of creating a new variable (always referring to the macroregions) starting from the single regions.

As of now, REDexp has different values according to macroregions, so I am not sure if centering it would be useful. But still, I am not an expert, so I can be wrong.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17678
#6

15 Oct 2020, 08:50

Chiara:
I think you could be better advised posting what you tyoed and what Stata gave you back (as per FAQ) whenever you're ready to.
Let's see teh results without centering.

Kind regards,
Carlo
(Stata 19.0)
Comment
Chiara De Siena

Join Date: Oct 2020

Posts: 28
#7

28 Oct 2020, 03:54

Hi Lazzaro,

I am not posting the results because I am just creating the variable in stata (using generate newvar and then replace, if).
Specifically, since you were right about centering, I thought that I can center the variable on the Italian average, so that values for macroregions will be either above or below the country average. However, I have observations for 14 years, so 14 different values of the Italian average. Is it possible to create a variable for all the years which has different mean values for each years? (I will use append to link all the yearly datasets into a single one)

Have a nice day, and thanks for your help.
Best regards,

Chiara
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17678

28 Oct 2020, 04:00

Chiara:
in the following toy-example, -age- is centered at year-specific mean age:

Code:

. use "https://www.stata-press.com/data/r16/nlswork.dta"
(National Longitudinal Survey.  Young Women 14-26 years of age in 1968)

. bysort year: egen pre_cent=mean(age)

. bysort year: gen cent=age-pre_cent

Kind regards,
Carlo
(Stata 19.0)

Comment

Chiara De Siena

Join Date: Oct 2020

Posts: 28
#9

28 Oct 2020, 04:11

Thank you, Carlo!
Comment

Announcement

Panel data - how to analyse regional differences (over time)?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment