Good afternoon everyone,
I am currently working with a dataset regarding immigration https://data.europa.eu/data/datasets...kraq?locale=en .
I modified the dataset in the following way:
** keep values for observation with country of citizenship "foreign country", "Non EU27 countries non reporting country" and "EU27 countries non reporting"**
keep if citizen == "FOR" | citizen == "NEU27_2020_FOR" | citizen == "EU27_2020_FOR" | citizen == "STLS"
** keep values for observatons with geopolitical entity (reportiing) "North-west", "north east", "center", "south" and "islands" of italy**
keep if geo == "ITC" | geo == "ITH" | geo == "ITI" | geo == "ITF" | geo == "ITG"
**Keep values for observations with age 15-64**
keep if age == "Y15-64"
** keep waves from 2010 to 2020**
keep if time_period == 2010 |time_period == 2011 | time_period == 2012 | time_period == 2013 | time_period == 2014 | time_period == 2015 | time_period == 2016 | time_period == 2017| time_period == 2018 | time_period == 2019 | time_period == 2020
** keep working status for employed, unemployed and active in the labor force**
keep if wstatus == "EMP" | wstatust == "UNE" | wstatus == "ACT"
** replace values for regions**
gen geo_4 = geo
replace geo_4 = "1" if geo_4 == "ITC"
replace geo_4 = "2" if geo_4 == "ITH"
replace geo_4 = "3" if geo_4 == "ITI"
replace geo_4 = "4" if geo_4 == "ITF"
replace geo_4 = "4" if geo_4 == "ITG"
drop geo
the dataset has now repeated values for each working status, zone and time period
example:
wstatus = EMP | geo = 1 | time_period = 2010
wstatus = EMP | geo = 1 | time_period = 2010
wstatus = EMP | geo = 1 | time_period = 2010
wstatus = EMP | geo = 1 | time_period = 2010
wstatus = EMP | geo = 2 | time_period = 2010
wstatus = EMP | geo = 2 | time_period = 2010
Now, what I need to do next is to group all the variables so that each working status appears only once for each region and each year.
example:
wstatus = EMP | geo = 1 | time_period = 2010
wstatus = EMP | geo = 2 | time_period = 2010
wstatus = EMP | geo = 3 | time_period = 2010
wstatus = EMP | geo = 4 | time_period = 2010
wstatus = EMP | geo = 1 | time_period = 2011
wstatus = EMP | geo = 2 | time_period = 2011
wstatus = EMP | geo = 3 | time_period = 2011
wstatus = EMP | geo = 4 | time_period = 2011
wstatus = EMP | geo = 1 | time_period = 2012
wstatus = EMP | geo = 2 | time_period = 2012
wstatus = EMP | geo = 3 | time_period = 2012
wstatus = EMP | geo = 4 | time_period = 2012
....
wstatus = UNE | geo = 1 | time_period = 2010
wstatus = UNE | geo = 2 | time_period = 2010
wstatus = UNE | geo = 3 | time_period = 2010
wstatus = UNE | geo = 4 | time_period = 2010
wstatus = UNE | geo = 1 | time_period = 2011
wstatus = UNE | geo = 2 | time_period = 2011
wstatus = UNE | geo = 3 | time_period = 2011
wstatus = UNE | geo = 4 | time_period = 2011
.....
this goes for each year, geographic zone and working status.
Is there a way to solve this problem?
Thank you !
I am currently working with a dataset regarding immigration https://data.europa.eu/data/datasets...kraq?locale=en .
I modified the dataset in the following way:
** keep values for observation with country of citizenship "foreign country", "Non EU27 countries non reporting country" and "EU27 countries non reporting"**
keep if citizen == "FOR" | citizen == "NEU27_2020_FOR" | citizen == "EU27_2020_FOR" | citizen == "STLS"
** keep values for observatons with geopolitical entity (reportiing) "North-west", "north east", "center", "south" and "islands" of italy**
keep if geo == "ITC" | geo == "ITH" | geo == "ITI" | geo == "ITF" | geo == "ITG"
**Keep values for observations with age 15-64**
keep if age == "Y15-64"
** keep waves from 2010 to 2020**
keep if time_period == 2010 |time_period == 2011 | time_period == 2012 | time_period == 2013 | time_period == 2014 | time_period == 2015 | time_period == 2016 | time_period == 2017| time_period == 2018 | time_period == 2019 | time_period == 2020
** keep working status for employed, unemployed and active in the labor force**
keep if wstatus == "EMP" | wstatust == "UNE" | wstatus == "ACT"
** replace values for regions**
gen geo_4 = geo
replace geo_4 = "1" if geo_4 == "ITC"
replace geo_4 = "2" if geo_4 == "ITH"
replace geo_4 = "3" if geo_4 == "ITI"
replace geo_4 = "4" if geo_4 == "ITF"
replace geo_4 = "4" if geo_4 == "ITG"
drop geo
the dataset has now repeated values for each working status, zone and time period
example:
wstatus = EMP | geo = 1 | time_period = 2010
wstatus = EMP | geo = 1 | time_period = 2010
wstatus = EMP | geo = 1 | time_period = 2010
wstatus = EMP | geo = 1 | time_period = 2010
wstatus = EMP | geo = 2 | time_period = 2010
wstatus = EMP | geo = 2 | time_period = 2010
Now, what I need to do next is to group all the variables so that each working status appears only once for each region and each year.
example:
wstatus = EMP | geo = 1 | time_period = 2010
wstatus = EMP | geo = 2 | time_period = 2010
wstatus = EMP | geo = 3 | time_period = 2010
wstatus = EMP | geo = 4 | time_period = 2010
wstatus = EMP | geo = 1 | time_period = 2011
wstatus = EMP | geo = 2 | time_period = 2011
wstatus = EMP | geo = 3 | time_period = 2011
wstatus = EMP | geo = 4 | time_period = 2011
wstatus = EMP | geo = 1 | time_period = 2012
wstatus = EMP | geo = 2 | time_period = 2012
wstatus = EMP | geo = 3 | time_period = 2012
wstatus = EMP | geo = 4 | time_period = 2012
....
wstatus = UNE | geo = 1 | time_period = 2010
wstatus = UNE | geo = 2 | time_period = 2010
wstatus = UNE | geo = 3 | time_period = 2010
wstatus = UNE | geo = 4 | time_period = 2010
wstatus = UNE | geo = 1 | time_period = 2011
wstatus = UNE | geo = 2 | time_period = 2011
wstatus = UNE | geo = 3 | time_period = 2011
wstatus = UNE | geo = 4 | time_period = 2011
.....
this goes for each year, geographic zone and working status.
Is there a way to solve this problem?
Thank you !

Comment