Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • TEST - regional dataset - cluster at different level?

    Dear Statalists,

    My apologies in advance for the long post.

    I am undertaking a study on the determinants of interregional migration in the European Union using an unbalanced panel dataset composed of 129 Europeans regions (in 13 countries) over the period 1998 – 2013 (total observations of 1350). Based on the neoclassical theory of migration, which states that differences in expected earnings are the primer driver of labor migration, I am testing the effect of regional wage and unemployment differentials on migration-induced population growth at the regional level.

    I am differencing between migration within the country (people moving between regions of the same country) and migration between countries (people coming/going to/from a region in another country).

    My hypothesis is the following: labor market incentives should have a stronger impact on “internal migration” as it is easier for people to move within the country to take advantage of higher regional wage or lower unemployment, as they are not subject to language barrier or whatsoever.

    I compare the effect of labor market incentives on “internal” and “international” migration.

    To that end, I run two similar regressions.
    The dependent variable is the net migration rate, where net migration flows correspond to people moving within the country for the internal migration regression and to/from other countries for the international migration one.

    The two independent variables of interest are the (logarithm of) wage and unemployment differentials (with respect to the appropriate economic area), lagged by one year to avoid endogeneity issues.

    For the internal migration, differentials are expressed with respect to the national average (i.e: ratio of the regional wage over the national average wage) while for the international migration; differentials are expressed with respect to the European union average (ratio of the regional wage over the EU average wage). The logarithm of the population density is added as a control variable.

    I estimated a fixed effect model with region and time fixed effects as there is a high probability that my covariates are correlated with my region fixed effects due to the inability to control for more regional characteristics. The standard hausman test as well as the robust one (xtoverid) confirms that fixed effect is indeed the appropriate model.

    Also, I detected heteroskedasticity (xttest3) as well as autocorrelation (xtserial), and thus used the option fe, vce(robust).

    However, I would like, especially when modeling the reaction of cross-border net migration, to take into account the economic performance of the country as a whole in addition to the regional characteristics. This would imply to include a country fixed effect and going for a three ways fixed effect model. While this method seems theoretically sound, implementing it on Stata looks quite difficult. In addition, my dataset doesn’t enjoy a high within variability; too many fixed effects may capture all the variance.

    I then considered clustering at the country level (which take also care of clustering at the regional level), which will allow the error terms of regions belonging to the same country to be correlated (as well as the error-terms of observation of the same region to be correlated over time and heteroskedastic). However, I am concern whether this is the most appropriate method to handle it: as my number of cluster is quite small (13 countries), and the clustering relies on the number of cluster going to infinity. According to Kédizi (Kézdi, Gabor. 2004. “Robust Standard Error Estimation in Fixed-E↵ects Panel Models.” ), clustering with less than 50 clusters may be even worse than not clustering at all.

    #0. General question: how can I take into account the fact that my data are nested by country ? Do I have to take it into account in my case or can I “ignore” the structure of the data (given my unit of observation is the region)? I guess it is even possible that regions belonging to different countries have more in common in terms of unemployment experience and regional income than regions of the same countries, which will oppose to cluster at the country level.

    #1. On which criterion should I compare my estimates using cluster at the region level versus cluster at the country level. Is even clustering at the country level right theoretically and methodologically given the small number of cluster (13 clusters)?

    #2. Otherwise, should I abandon the fixed effect model and go for a mixed (hierarchical) one, which will allow me to include a country fixed effect? If yes, is there something I should be careful with?

    #3. Also, it seems that a lot of the variability exists between regions rather than within them -> is this evidence that I should maybe go for a between model? How can I formally check for this besides using xtsum variables.

    #4. Lastly, if I am correct the robust / clustering option(s) takes care also of potential autocorrelation. When looking for it, I read that one can also use xtregar (Cochran-Orcutt transformation I think). When do we prefer xtregar over xtreg, fe robust? It is possible to include time fixed effect with xtregar (I was unable too).

    Below is a sample of my dataset as well as the regression outputs using cluster(region) versus cluster (country).

    [xtsum netmig_rate_dom1 netmig_rate_int1 lnunempratio lngdpratio lnunempratioE lngdpratioE lnpop_dens ]
    [xtreg netmig_rate_dom1 lnunempratio lngdpratio lnpop_dens i.year, fe vce(robust)]
    [xtreg netmig_rate_int1 lnunempratioE lngdpratioE lnpop_dens i.year, fe vce(robust)]
    [xtreg netmig_rate_dom1 lnunempratio lngdpratio lnpop_dens i.year, fe vce(cluster country)]
    [xtreg netmig_rate_int1 lnunempratioE lngdpratioE lnpop_dens i.year, fe vce(cluster country)]
    Click image for larger version

Name:	Capture d’écran 2016-05-24 à 18.04.19.png
Views:	3
Size:	82.0 KB
ID:	1342406

    Sorry again for the long post, I hope I expressed myself clearly. Any help would be much appreciated.

    Best ,
    Randa


  • #2
    Code:
    xtsum netmig_rate_dom1 netmig_rate_int1 lnunempratio lngdpratio lnunempratioE lngdpratioE lnpop_dens

    Comment


    • #3
      Code:
      xtsum netmig_rate_dom1 netmig_rate_int1 lnunempratio lngdpratio lnunempratioE lngdpratioE lnpop_dens
      Code:
      xtreg netmig_rate_dom1 lnunempratio lngdpratio lnpop_dens i.year, fe vce(robust)   
      xtreg netmig_rate_int1 lnunempratioE lngdpratioE lnpop_dens i.year, fe vce(robust) 
      
      xtreg netmig_rate_dom1 lnunempratio lngdpratio lnpop_dens i.year, fe vce(cluster country)
      xtreg netmig_rate_int1 lnunempratioE lngdpratioE lnpop_dens i.year, fe vce(cluster country)

      Comment


      • #4
        jj
        Attached Files

        Comment


        • #5
          jj
          Click image for larger version

Name:	Capture d’écran 2016-05-24 à 18.04.19.png
Views:	3
Size:	82.0 KB
ID:	1342418

          Comment


          • #6
            Code:
            xtsum netmig_rate_dom1 netmig_rate_int1 lnunempratio lngdpratio lnunempratioE lngdpratioE lnpop_dens
            Click image for larger version

Name:	Capture d’écran 2016-05-24 à 18.25.49.png
Views:	1
Size:	78.5 KB
ID:	1342420

            Code:
            xtreg netmig_rate_dom1 lnunempratio lngdpratio lnpop_dens i.year, fe vce(robust)
            Click image for larger version

Name:	Capture d’écran 2016-05-24 à 18.27.08.png
Views:	1
Size:	43.8 KB
ID:	1342421

            Comment


            • #7
              Code:
              xtsum netmig_rate_dom1 netmig_rate_int1 lnunempratio lngdpratio lnunempratioE lngdpratioE lnpop_dens
              Code:
              xtreg netmig_rate_dom1 lnunempratio lngdpratio lnpop_dens i.year, fe vce(robust)




              input float(netmig_rate_dom1 netmig_rate_int1 lnunempratio lngdpratio lnunempratioE lngdpratioE lnpop_dens)
              .4 2.1577818 . . . . 4.319619
              1.1 -1.5657657 .11523528 -.369527 -.7415208 -.16009565 4.319087
              2.9 -2.0123415 .24881123 -.3590049 -.6065263 -.153451 4.3200183
              3.3 1.122173 .03077166 -.3474989 -.6589898 -.14331606 4.32453
              4 -.12180835 .09224976 -.3793692 -.56991744 -.180336 4.328494
              4.1 -.7176983 .0852976 -.3957168 -.5142084 -.21592396 4.331785
              5.4 -2.153201 -.06808911 -.4018451 -.54760253 -.2151223 4.3380747

              Comment


              • #8
                Code:
                 -1.3    3.437671      .427444    -.5307569      .8077188    -.8176909 4.6037693
                 -1.5    3.413258     .4319919   -.54694057      .7677731    -.8168479   4.60577
                 -1.5   3.0754294     .3480252    -.5654474      .7360377    -.7583575  4.607368
                 -1.2    3.379636     .3719424    -.5312725      .6136779    -.6905655  4.609361
                 -1.5    5.063307    .34151635   -.58433455      .5583171    -.7433203   4.61294
                 -1.4   4.1555514     .3187916    -.5799973      .6333671    -.7111707 4.6157146
                 -1.4    13.41248     .3829923   -.57086456      .6353321    -.6685613  4.627812
                 -1.5    3.130178     .3794896   -.56790346      .5646534    -.6524929  4.629375
                end
                ------------------ copy up to and including the previous line ------------------

                Comment


                • #9
                  Code:
                  input float(netmig_rate_dom1 netmig_rate_int1 lnunempratio lngdpratio lnunempratioE lngdpratioE lnpop_dens)
                     .4   2.1577818            .            .             .            .  4.319619
                    1.1  -1.5657657    .11523528     -.369527     -.7415208   -.16009565  4.319087
                    2.9  -2.0123415    .24881123    -.3590049     -.6065263     -.153451 4.3200183
                    3.3    1.122173    .03077166    -.3474989     -.6589898   -.14331606   4.32453
                      4  -.12180835    .09224976    -.3793692    -.56991744     -.180336  4.328494
                    4.1   -.7176983     .0852976    -.3957168     -.5142084   -.21592396  4.331785
                    5.4   -2.153201   -.06808911    -.4018451    -.54760253    -.2151223 4.3380747
                    2.4   4.0137634     .0798222    -.4097158     -.5314545   -.22432917  4.344455
                    3.2 .0035248874   .006688943    -.3876666      -.598564   -.20305815  4.347823
                    3.4  -.27408716   -.10285737    -.3785571     -.8981093    -.1878347  4.350923
                    2.5    1.742587   -.08676768    -.3851104     -.9581904    -.1625664 4.3552976
                    1.8   1.4126707    .03190456    -.3604454     -.8537293   -.13663277  4.358502
                   -1.5    4.929208            .            .             .            . 4.4018292
                   -1.6   4.6092677    .07356248   -.18091995     -.7831935    .02851138  4.404888
                   -2.2     7.35059   -.10536055   -.17658705     -.9606981   .028966835  4.410128
                   -1.8    9.273081   -.14792019    -.1632431     -.8376815    .04093971  4.417635
                   -2.2    9.463461   -.11804572   -.17912614     -.7802129   .019907087 4.4249663
                    3.6   1.4862424   -.07833186    -.1749162     -.6778378   .004876624 4.4301023
                    3.6    .7458477   -.09278175     -.165928    -.57229507   .020794826  4.434263
                    1.8    2.844737   -.04692943   -.16008355     -.6582062   .025303103 4.4388795
                    1.6   .23974097   -.05518647     -.168242     -.6604394   .016366405 4.4411206
                    1.6     .612624   -.12817521    -.1734051      -.923427     .0173174  4.443357
                     .8   2.2775416    .08230864   -.17309293     -.7891141      .049451 4.4466434
                    2.4   .18349305    .03190456    -.1739023     -.8537293    .04991028  4.449218

                  Comment


                  • #10
                    Dear Statalists,

                    My apologies in advance for the long post. I tried to be exhaustive and concise at the same time.

                    I am undertaking a study on the determinants of interregional migration in the European Union using an unbalanced panel dataset composed of 129 Europeans regions (in 13 countries) over the period 1998 – 2013 (total observations of 1350). Based on the neoclassical theory of migration, which states that differences in expected earnings are the primer driver of labor migration, I am testing the effect of regional wage and unemployment differentials on migration-induced population growth at the regional level.

                    I am differencing between migration within the country (people moving between regions of the same country) and migration between countries (people coming/going to/from a region in another country).

                    My hypothesis is the following: labor market incentives should have a stronger impact on “internal migration” as it is easier for people to move within the country to take advantage of higher regional wage or lower unemployment, as they are not subject to language barrier or whatsoever.

                    I compare the effect that labor market incentives have on “internal” and “international” migration.To that end, I run two similar regressions.
                    The dependent variable is the net migration rate, where net migration flows correspond to people moving within the country for the internal migration regression and to/from other countries for the international migration one. The two independent variables of interest are the (logarithm of) wage and unemployment differentials (with respect to the appropriate economic area), lagged by one year to avoid endogeneity issues. For the internal migration, differentials are expressed with respect to the national average (i.e: ratio of the regional wage over the national average wage) while for the international migration; differentials are expressed with respect to the European union average (i.e ratio of the regional wage over the EU average wage). The logarithm of the population density is added as a control variable.

                    I estimated a fixed effect model with region and time fixed effects as there is a high probability that my covariates are correlated with my region fixed effects due to the inability to control for more regional characteristics. The standard hausman test as well as the robust one (xtoverid) confirms that fixed effect is indeed the appropriate model.

                    Also, I detected heteroskedasticity (xttest3) as well as autocorrelation (xtserial), and thus used the option fe, vce(robust). Testing for cross-sectional correlation did not work using either xttest2 or xtcsd (stat error: not enough common observation across panel).

                    However,ideally I would like to take into account the economic performance of the country as a whole in addition to the regional characteristics. This would imply to include a country fixed effect and going for a three ways fixed effect model. While this method seems theoretically sound, implementing it on Stata looks quite difficult. In addition, my dataset doesn’t enjoy a high within variability; too many fixed effects may capture all the variance.

                    I then considered clustering at the country level (which take also care of clustering at the regional level), which will allow the error terms of regions belonging to the same country to be correlated (as well as the error-terms of observation of the same region to be correlated over time and heteroskedastic). However, I am concern whether this is the most appropriate method to handle it: as my number of cluster is quite small (13 countries), and the clustering relies on the number of cluster going to infinity. According to Kédizi (Kézdi, Gabor. 2004. “Robust Standard Error Estimation in Fixed-E↵ects Panel Models.” ), clustering with less than 50 clusters may be even worse than not clustering at all.

                    #0. General question: how can I take into account the fact that my data are nested by country? Do I have to take it into account in my case or can I “ignore” the structure of the data (given my unit of observation is the region)? I guess it is even possible that regions belonging to different countries have more in common in terms of unemployment experience and regional income than regions of the same countries, which will go against clustering at the country level.

                    #1. Which criterion should I use to compare my estimates using cluster at the region level versus cluster at the country level. Is even clustering at the country level right theoretically and methodologically given the small number of cluster (13 clusters)?

                    #2. Otherwise, should I abandon the fixed effect model and go for a mixed (hierarchical) one, which will allow me to include a country fixed effect? If yes, is there something I should be careful with?

                    #3. Also, it seems that a lot of the variability exists between regions rather than within them -> is this evidence that I should maybe go for a between model? How can I formally check for this besides using xtsum.

                    #4. Lastly, if I am correct the robust / clustering option(s) takes also care of potential autocorrelation. When looking for it, I read that one can also use xtregar (Cochran-Orcutt transformation if I am correct). When do we prefer xtregar over xtreg, fe robust? It is possible to include time fixed effect with xtregar (I was unable too).

                    Below is a sample of my dataset as well as the regression outputs using cluster (region) versus cluster (country).

                    Sorry again for the long post, I hope I expressed myself clearly. Any help would be much appreciated.

                    Best ,
                    Randa

                    Comment

                    Working...
                    X