Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Group variables and create subsets

    Good afternoon everyone,

    I am currently working with a dataset regarding immigration https://data.europa.eu/data/datasets...kraq?locale=en .
    I modified the dataset in the following way:

    ** keep values for observation with country of citizenship "foreign country", "Non EU27 countries non reporting country" and "EU27 countries non reporting"**
    keep if citizen == "FOR" | citizen == "NEU27_2020_FOR" | citizen == "EU27_2020_FOR" | citizen == "STLS"

    ** keep values for observatons with geopolitical entity (reportiing) "North-west", "north east", "center", "south" and "islands" of italy**
    keep if geo == "ITC" | geo == "ITH" | geo == "ITI" | geo == "ITF" | geo == "ITG"

    **Keep values for observations with age 15-64**
    keep if age == "Y15-64"

    ** keep waves from 2010 to 2020**
    keep if time_period == 2010 |time_period == 2011 | time_period == 2012 | time_period == 2013 | time_period == 2014 | time_period == 2015 | time_period == 2016 | time_period == 2017| time_period == 2018 | time_period == 2019 | time_period == 2020

    ** keep working status for employed, unemployed and active in the labor force**
    keep if wstatus == "EMP" | wstatust == "UNE" | wstatus == "ACT"

    ** replace values for regions**
    gen geo_4 = geo
    replace geo_4 = "1" if geo_4 == "ITC"
    replace geo_4 = "2" if geo_4 == "ITH"
    replace geo_4 = "3" if geo_4 == "ITI"
    replace geo_4 = "4" if geo_4 == "ITF"
    replace geo_4 = "4" if geo_4 == "ITG"
    drop geo

    the dataset has now repeated values for each working status, zone and time period

    example:
    wstatus = EMP | geo = 1 | time_period = 2010
    wstatus = EMP | geo = 1 | time_period = 2010
    wstatus = EMP | geo = 1 | time_period = 2010
    wstatus = EMP | geo = 1 | time_period = 2010
    wstatus = EMP | geo = 2 | time_period = 2010
    wstatus = EMP | geo = 2 | time_period = 2010

    Now, what I need to do next is to group all the variables so that each working status appears only once for each region and each year.

    example:

    wstatus = EMP | geo = 1 | time_period = 2010
    wstatus = EMP | geo = 2 | time_period = 2010
    wstatus = EMP | geo = 3 | time_period = 2010
    wstatus = EMP | geo = 4 | time_period = 2010
    wstatus = EMP | geo = 1 | time_period = 2011
    wstatus = EMP | geo = 2 | time_period = 2011
    wstatus = EMP | geo = 3 | time_period = 2011
    wstatus = EMP | geo = 4 | time_period = 2011
    wstatus = EMP | geo = 1 | time_period = 2012
    wstatus = EMP | geo = 2 | time_period = 2012
    wstatus = EMP | geo = 3 | time_period = 2012
    wstatus = EMP | geo = 4 | time_period = 2012
    ....
    wstatus = UNE | geo = 1 | time_period = 2010
    wstatus = UNE | geo = 2 | time_period = 2010
    wstatus = UNE | geo = 3 | time_period = 2010
    wstatus = UNE | geo = 4 | time_period = 2010
    wstatus = UNE | geo = 1 | time_period = 2011
    wstatus = UNE | geo = 2 | time_period = 2011
    wstatus = UNE | geo = 3 | time_period = 2011
    wstatus = UNE | geo = 4 | time_period = 2011
    .....
    this goes for each year, geographic zone and working status.

    Is there a way to solve this problem?

    Thank you !

  • #2
    So, I only read the last part as I am not sure how they are related to the final question. To do "what I need to do next is to group all the variables so that each working status appears only once for each region and each year," it's possible to use egen tag:

    Code:
    egen AnyVarName = tag(westatus geo time_period)
    The first unique combo will receive a "1" and the subsequent identical line will receive a "0". If you retain the cases with AnyVarName being 1, you should end up with only unique combos.

    Comment

    Working...
    X