Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Create new Variable: Migration flows

    Hi,

    I have a dataset in long form with the variables country of origin, destination country, stock of female migrants from country I (origin) to destination (j), population_female, year, population origin country.
    I want to create a new variable (dependent) which is the migration flow (=((stock of female migrants from i to j)- (stock of female migrants form i to j (t-1))/ (population female country i).
    I don't know how to create this variable. Can someone help here?


  • #2
    While we might be able to make a reasonable guess about the structure of your data, using -dataex- to display an excerpt of your data would substantially increase your chances of a quick and helpful answer. I would note also that describing a "stock" as "from i to j" confused me, as the from/to sounds like a flow to me.

    Comment


    • #3
      Hi Lea,

      Welcome to Statalist! It looks to me from Mike's comment that you are trying to create a "flow" variable by measuring the change in the "stock" variable.

      1) It would be helpful, if as Mike mentioned, you shared 30-40 observations of your data using Stata's dataex command. If you are not familiar with dataex, I created a Youtube video on how to use it here.

      2) I took a stab at creating some sample data. See if this is what you wanted:

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input int year long women_new_country int change long women_in_origin
      2010 10000    . 500000
      2011 14582 4582 495418
      2012 18771 4189 491229
      2013 25640 6869 484360
      2014 33416 7776 476584
      2015 39002 5586 470998
      2016 45826 6824 464174
      end
      ------------------ copy up to and including the previous line ------------------

      Code:
      * Note: I created the variable "women_new_country" to be sum of prior year plus "flow".
      format women_new_country change women_in_origin %10.0fc
      
      . list, noobs abbrev(18)
      
        +-----------------------------------------------------+
        | year   women_new_country   change   women_in_origin |
        |-----------------------------------------------------|
        | 2010              10,000        .           500,000 |
        | 2011              14,582    4,582           495,418 |
        | 2012              18,771    4,189           491,229 |
        | 2013              25,640    6,869           484,360 |
        | 2014              33,416    7,776           476,584 |
        |-----------------------------------------------------|
        | 2015              39,002    5,586           470,998 |
        | 2016              45,826    6,824           464,174 |
        +-----------------------------------------------------+
      
      * I will create it two ways: one if your data is set as time-series (tsset), and one if not
      * If not tsset
      gen flow = women_new_country - women_new_country[_n-1]
      gen flow_pct = flow / women_in_origin
      format flow %10.0fc
      
      . list, noobs abbrev(18)
      
        +------------------------------------------------------------------------+
        | year   women_new_country   change   women_in_origin    flow   flow_pct |
        |------------------------------------------------------------------------|
        | 2010              10,000        .           500,000       .          . |
        | 2011              14,582    4,582           495,418   4,582   .0092488 |
        | 2012              18,771    4,189           491,229   4,189   .0085276 |
        | 2013              25,640    6,869           484,360   6,869   .0141816 |
        | 2014              33,416    7,776           476,584   7,776   .0163161 |
        |------------------------------------------------------------------------|
        | 2015              39,002    5,586           470,998   5,586   .0118599 |
        | 2016              45,826    6,824           464,174   6,824   .0147014 |
        +------------------------------------------------------------------------+
      
      * Set data as time-series
      help tsset
      tsset year, year
      gen flow2 = women_new_country - l.women_new_country  // that's a lower-case "L" for "Lag"
      gen flow_pct2 = flow2 / women_in_origin
      
      . list, noobs abbrev(18)
      
        +--------------------------------------------------------------------------------------------+
        | year   women_new_country   change   women_in_origin    flow   flow_pct   flow2   flow_pct2 |
        |--------------------------------------------------------------------------------------------|
        | 2010              10,000        .           500,000       .          .       .           . |
        | 2011              14,582    4,582           495,418   4,582   .0092488   4,582    .0092488 |
        | 2012              18,771    4,189           491,229   4,189   .0085276   4,189    .0085276 |
        | 2013              25,640    6,869           484,360   6,869   .0141816   6,869    .0141816 |
        | 2014              33,416    7,776           476,584   7,776   .0163161   7,776    .0163161 |
        |--------------------------------------------------------------------------------------------|
        | 2015              39,002    5,586           470,998   5,586   .0118599   5,586    .0118599 |
        | 2016              45,826    6,824           464,174   6,824   .0147014   6,824    .0147014 |
        +--------------------------------------------------------------------------------------------+

      Comment


      • #4
        Thank you for your help!

        These are some observations of my dataset: destination country, origin country, female (stock of migrants female) and population origin.

        My data is a time series. Is it not the same if I do it with lag migration stock or migration stock [_n-1]?

        How does a PPML estimator work here?



        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input str28(iso3_d iso3_o) int year long female double pop_o
        "CHE" "AFG" 1980   254 13.180431365966797
        "CAN" "AFG" 1980    65 13.180431365966797
        "GBR" "AFG" 1980     0 13.180431365966797
        "AUS" "AFG" 1980     0 13.180431365966797
        "PRT" "AFG" 1980     0 13.180431365966797
        "NOR" "AFG" 1980     3 13.180431365966797
        "ESP" "AFG" 1980     0 13.180431365966797
        "SWE" "AFG" 1980    17 13.180431365966797
        "GRC" "AFG" 1980     0 13.180431365966797
        "NLD" "AFG" 1980   399 13.180431365966797
        "CHL" "AFG" 1980     0 13.180431365966797
        "FRA" "AFG" 1980     0 13.180431365966797
        "IRL" "AFG" 1980     0 13.180431365966797
        "AUT" "AFG" 1980     0 13.180431365966797
        "DNK" "AFG" 1980     3 13.180431365966797
        "LUX" "AFG" 1980     0 13.180431365966797
        "DEU" "AFG" 1980     0 13.180431365966797
        "FIN" "AFG" 1980     0 13.180431365966797
        "USA" "AFG" 1980  1520 13.180431365966797
        "NZL" "AFG" 1980     3 13.180431365966797
        "ESP" "AFG" 1985    19 11.528977394104004
        "USA" "AFG" 1985  6520 11.528977394104004
        "NOR" "AFG" 1985     3 11.528977394104004
        "GRC" "AFG" 1985     0 11.528977394104004
        "PRT" "AFG" 1985     0 11.528977394104004
        "NZL" "AFG" 1985    12 11.528977394104004
        "AUS" "AFG" 1985     0 11.528977394104004
        "NLD" "AFG" 1985   673 11.528977394104004
        "CAN" "AFG" 1985   500 11.528977394104004
        "DEU" "AFG" 1985     0 11.528977394104004
        "LUX" "AFG" 1985     0 11.528977394104004
        "CHL" "AFG" 1985     0 11.528977394104004
        "CHE" "AFG" 1985   281 11.528977394104004
        "DNK" "AFG" 1985    12 11.528977394104004
        "FRA" "AFG" 1985   310 11.528977394104004
        "SWE" "AFG" 1985    45 11.528977394104004
        "FIN" "AFG" 1985     0 11.528977394104004
        "GBR" "AFG" 1985     0 11.528977394104004
        "AUT" "AFG" 1985     0 11.528977394104004
        "IRL" "AFG" 1985     0 11.528977394104004
        "IRL" "AFG" 1990     0 11.731192588806152
        "FIN" "AFG" 1990     0 11.731192588806152
        "GRC" "AFG" 1990     0 11.731192588806152
        "DEU" "AFG" 1990     0 11.731192588806152
        "PRT" "AFG" 1990     0 11.731192588806152
        "NLD" "AFG" 1990  1145 11.731192588806152
        "ESP" "AFG" 1990    29 11.731192588806152
        "FRA" "AFG" 1990   644 11.731192588806152
        "NZL" "AFG" 1990    39 11.731192588806152
        "CAN" "AFG" 1990  2590 11.731192588806152
        "USA" "AFG" 1990 11975 11.731192588806152
        "AUS" "AFG" 1990  1275 11.731192588806152
        "CHL" "AFG" 1990     0 11.731192588806152
        "NOR" "AFG" 1990    67 11.731192588806152
        "LUX" "AFG" 1990     0 11.731192588806152
        "GBR" "AFG" 1990     0 11.731192588806152
        "AUT" "AFG" 1990     0 11.731192588806152
        "CHE" "AFG" 1990   336 11.731192588806152
        "DNK" "AFG" 1990    59 11.731192588806152
        "SWE" "AFG" 1990   180 11.731192588806152
        "LUX" "AFG" 1995     0  17.58607292175293
        "SWE" "AFG" 1995   664  17.58607292175293
        "PRT" "AFG" 1995     0  17.58607292175293
        "GBR" "AFG" 1995  2280  17.58607292175293
        "NZL" "AFG" 1995    96  17.58607292175293
        "CHL" "AFG" 1995     0  17.58607292175293
        "AUT" "AFG" 1995     0  17.58607292175293
        "CHE" "AFG" 1995   511  17.58607292175293
        "CAN" "AFG" 1995  5600  17.58607292175293
        "GRC" "AFG" 1995     0  17.58607292175293
        "FRA" "AFG" 1995   846  17.58607292175293
        "DEU" "AFG" 1995 30439  17.58607292175293
        "DNK" "AFG" 1995   199  17.58607292175293
        "AUS" "AFG" 1995  2895  17.58607292175293
        "ESP" "AFG" 1995    52  17.58607292175293
        "USA" "AFG" 1995 14229  17.58607292175293
        "NLD" "AFG" 1995  1344  17.58607292175293
        "NOR" "AFG" 1995   145  17.58607292175293
        "FIN" "AFG" 1995    17  17.58607292175293
        "IRL" "AFG" 1995     0  17.58607292175293
        "LUX" "AFG" 2000     0 20.595359802246094
        "CHL" "AFG" 2000     0 20.595359802246094
        "GBR" "AFG" 2000  5133 20.595359802246094
        "NLD" "AFG" 2000  8499 20.595359802246094
        "USA" "AFG" 2000 17667 20.595359802246094
        "ESP" "AFG" 2000    43 20.595359802246094
        "NZL" "AFG" 2000   318 20.595359802246094
        "IRL" "AFG" 2000    48 20.595359802246094
        "AUS" "AFG" 2000  4520 20.595359802246094
        "CAN" "AFG" 2000 10920 20.595359802246094
        "CHE" "AFG" 2000   813 20.595359802246094
        "FIN" "AFG" 2000   179 20.595359802246094
        "PRT" "AFG" 2000     0 20.595359802246094
        "NOR" "AFG" 2000   287 20.595359802246094
        "DNK" "AFG" 2000   643 20.595359802246094
        "DEU" "AFG" 2000 32092 20.595359802246094
        "GRC" "AFG" 2000     0 20.595359802246094
        "AUT" "AFG" 2000     0 20.595359802246094
        "FRA" "AFG" 2000  1060 20.595359802246094
        "SWE" "AFG" 2000  1909 20.595359802246094
        end

        Comment


        • #5
          Thanks for sharing your data using dataex!

          I would consider your data to be panel data (because you have destination country and year). As you note, the answer will be the same whether you use lag migration stock or migration stock [_n-1].

          Code:
          . table iso3_d year, c(mean female) row col format(%9.0gc)
          
          ----------------------------------------------------------------------
                    |                            year                          
             iso3_d |     1980      1985      1990      1995      2000     Total
          ----------+-----------------------------------------------------------
                AUS |        0         0     1,275     2,895     4,520     1,738
                AUT |        0         0         0         0         0         0
                CAN |       65       500     2,590     5,600    10,920     3,935
                CHE |      254       281       336       511       813       439
                CHL |        0         0         0         0         0         0
                DEU |        0         0         0    30,439    32,092  12,506.2
                DNK |        3        12        59       199       643     183.2
                ESP |        0        19        29        52        43      28.6
                FIN |        0         0         0        17       179      39.2
                FRA |        0       310       644       846     1,060       572
                GBR |        0         0         0     2,280     5,133   1,482.6
                GRC |        0         0         0         0         0         0
                IRL |        0         0         0         0        48       9.6
                LUX |        0         0         0         0         0         0
                NLD |      399       673     1,145     1,344     8,499     2,412
                NOR |        3         3        67       145       287       101
                NZL |        3        12        39        96       318      93.6
                PRT |        0         0         0         0         0         0
                SWE |       17        45       180       664     1,909       563
                USA |    1,520     6,520    11,975    14,229    17,667  10,382.2
                    |
              Total |    113.2    418.75    916.95  2,965.85  4,206.55  1,724.26
          ----------------------------------------------------------------------
          
          sort iso3_d year
          egen id = group( iso3_d)  // just creating a numeric country identifier
          order id, first
          bysort iso3_d (year): gen flow = female - female[_n-1]
          format flow female  %9.0gc
          
          . list id iso3_d year female flow pop_o in 1/25, noobs sepby(id)
          
            +----------------------------------------------+
            | id   iso3_d   year   female    flow    pop_o |
            |----------------------------------------------|
            |  1      AUS   1980        0       .   13.180 |
            |  1      AUS   1985        0       0   11.529 |
            |  1      AUS   1990    1,275   1,275   11.731 |
            |  1      AUS   1995    2,895   1,620   17.586 |
            |  1      AUS   2000    4,520   1,625   20.595 |
            |----------------------------------------------|
            |  2      AUT   1980        0       .   13.180 |
            |  2      AUT   1985        0       0   11.529 |
            |  2      AUT   1990        0       0   11.731 |
            |  2      AUT   1995        0       0   17.586 |
            |  2      AUT   2000        0       0   20.595 |
            |----------------------------------------------|
            |  3      CAN   1980       65       .   13.180 |
            |  3      CAN   1985      500     435   11.529 |
            |  3      CAN   1990    2,590   2,090   11.731 |
            |  3      CAN   1995    5,600   3,010   17.586 |
            |  3      CAN   2000   10,920   5,320   20.595 |
            |----------------------------------------------|
            |  4      CHE   1980      254       .   13.180 |
            |  4      CHE   1985      281      27   11.529 |
            |  4      CHE   1990      336      55   11.731 |
            |  4      CHE   1995      511     175   17.586 |
            |  4      CHE   2000      813     302   20.595 |
            |----------------------------------------------|
            |  5      CHL   1980        0       .   13.180 |
            |  5      CHL   1985        0       0   11.529 |
            |  5      CHL   1990        0       0   11.731 |
            |  5      CHL   1995        0       0   17.586 |
            |  5      CHL   2000        0       0   20.595 |
            +----------------------------------------------+
          I didn't create the flow as a % of women in origin_country just because I wasn't sure of the units. I assume the pop_o is in millions. I didn't know if female (and therefore flow) was in units or thousands.

          Comment


          • #6
            Thank you so much!!
            The population variable is in millions and female is in thousands. The problem with the "mean female" and grouping them is that I have a bilateral dataset with country pairs and therefore destination country and country of origin.

            Do you also know how to deal with negative values when using a PPML estimator?



            Last edited by Lea Hans; 30 Mar 2019, 10:50.

            Comment

            Working...
            X