Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help with panel models that have dependent variables that do not vary over the same unit

    Hi everyone!

    Im trying to estimate the determinants of measurement errors in migration flows and my equation looks something like this:
    Click image for larger version

Name:	Screenshot 2022-11-11 160125.jpg
Views:	1
Size:	14.3 KB
ID:	1688989

    Where ME represents the degree of inaccuracy and varies over country of destination (i), origin (j) and time t. While my demographic variables also vary over country pair and time, the variables that capture institutional quality and socio-political stability vary over either country of destination or origin (Corruption in destination and Political Stability in country of origin).

    I initially applied time and country pair fixed effects using reghdfe which yields the following result:
    Click image for larger version

Name:	stata snipping.jpg
Views:	1
Size:	69.6 KB
ID:	1688990


    (Apologies for the inconsistent naming of the variables r=country of residence and o=origin)
    However, I wonder if the approach taken for fixed effects is correct based on the information provided above? What are the disadvantages to adopting a country pair fixed effects for dependent variables that do not vary over country pair? Any help would be appreciated!
    Attached Files

  • #2
    Arpan:
    the main issue with the -fe- estimator is to have time-varying variables that show within-panel variation.
    From your post:
    1) unless you want to store both -panelid- and -timevar- fixed effect (the community-contributed module -reghdfe- has an option for that, the so called two-way fixed effect can be coded with -xtreg,fe-, too;
    2) while the within-Rsq looks good, you do not report on:
    a) -fe- being the way to go with your dataset;
    b) previous tests on the correct specification of the functional form of the regressand.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Hi Carlo!
      Thank you for your comment! Regarding point 1, I opt of reghdfe purely out of convenience. As for point two, I run the standard xtoverid the results of which are as follows:
      Click image for larger version

Name:	Screenshot 2022-11-12 094547.jpg
Views:	1
Size:	88.6 KB
ID:	1689122


      (I also run the standard hausman test with fe and re estimates and reject the null hypothesis in favor of the FE model) Additionally, regarding 2b, I haven't checked for the specification of the functional form (I'm assuming that you mean checking for omitted variable bias), however, I do check for heteroskedasticity using hettest and conclude that using robust standard errors is the more sound approach.

      Comment


      • #4
        Arpan:
        1) -xtoverid- outcome points you out to the -fe- specification;
        2) with such a large number of panels you should go cluster-robust standard errors to take both heteroskedasticity and autocorrelation of the epsilon into account;
        3) as far as checking the right secification of the functional form of the regressand, I meant something similar to -linktest- that should be replicated by hand as it cannot be called after -xtreg-. This procedure, that share some features of the -estat ovtest- is reported in the following toy-example:
        Code:
        . use "https://www.stata-press.com/data/r17/nlswork.dta"
        (National Longitudinal Survey of Young Women, 14-24 years old in 1968)
        
        . xtreg ln_wage c.age##c.age, fe vce(cluster idcode)
        
        Fixed-effects (within) regression               Number of obs     =     28,510
        Group variable: idcode                          Number of groups  =      4,710
        
        R-squared:                                      Obs per group:
             Within  = 0.1087                                         min =          1
             Between = 0.1006                                         avg =        6.1
             Overall = 0.0865                                         max =         15
        
                                                        F(2,4709)         =     507.42
        corr(u_i, Xb) = 0.0440                          Prob > F          =     0.0000
        
                                     (Std. err. adjusted for 4,710 clusters in idcode)
        ------------------------------------------------------------------------------
                     |               Robust
             ln_wage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
        -------------+----------------------------------------------------------------
                 age |   .0539076    .004307    12.52   0.000     .0454638    .0623515
                     |
         c.age#c.age |  -.0005973    .000072    -8.30   0.000    -.0007384   -.0004562
                     |
               _cons |    .639913   .0624195    10.25   0.000     .5175415    .7622845
        -------------+----------------------------------------------------------------
             sigma_u |   .4039153
             sigma_e |  .30245467
                 rho |  .64073314   (fraction of variance due to u_i)
        ------------------------------------------------------------------------------
        
        . predict fitted, xb
        (24 missing values generated)
        
        . g sq_fitted=fitted^2
        (24 missing values generated)
        
        . xtreg ln_wage fitted sq_fitted , fe vce(cluster idcode)
        
        Fixed-effects (within) regression               Number of obs     =     28,510
        Group variable: idcode                          Number of groups  =      4,710
        
        R-squared:                                      Obs per group:
             Within  = 0.1092                                         min =          1
             Between = 0.1033                                         avg =        6.1
             Overall = 0.0881                                         max =         15
        
                                                        F(2,4709)         =     523.09
        corr(u_i, Xb) = 0.0467                          Prob > F          =     0.0000
        
                                     (Std. err. adjusted for 4,710 clusters in idcode)
        ------------------------------------------------------------------------------
                     |               Robust
             ln_wage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
        -------------+----------------------------------------------------------------
              fitted |   2.569185   .7085064     3.63   0.000     1.180181    3.958189
           sq_fitted |    -.47432   .2153021    -2.20   0.028    -.8964128   -.0522272
               _cons |  -1.290258    .580562    -2.22   0.026    -2.428431   -.1520844
        -------------+----------------------------------------------------------------
             sigma_u |    .403403
             sigma_e |  .30238578
                 rho |  .64025357   (fraction of variance due to u_i)
        ------------------------------------------------------------------------------
        
        . test sq_fitted
        
         ( 1)  sq_fitted = 0
        
               F(  1,  4709) =    4.85
                    Prob > F =    0.0276
        
        .
        As -test- outcome rejects the null, the model is misspecified.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Thanks for the help! I did as you indicated and as I understand based on the screenshot below, my model has an issue of misspecification.

          Click image for larger version

Name:	Screenshot 2022-11-12 115331.jpg
Views:	2
Size:	104.7 KB
ID:	1689139
          Attached Files

          Comment


          • #6
            Arpan:
            therefore you have to include more predictors and/or interactions between them.
            In addition, as per FAQ please do not post screenshots but use CODE delimiters to share what you typed and what Stata gave you back Thanks.
            Last edited by Carlo Lazzaro; 12 Nov 2022, 07:44.
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment


            • #7
              Thanks Carlo! I shall keep that in mind.

              Comment


              • #8
                Carlo Lazzaro apologies to bring this subject back up. however, I wonder what are the issues associated with including variables that do not vary within panel (such as population of the country of residence being added to a migration model which varies over country pair and year). I have fixed my issues of misspecification by squared values of the corruption and stability indicators as mentioned in the first equation above but I wonder if using the correlated random effects approach might be more appropriate in this context as I could obtain within estimates of variables that vary over the cluster ID and get random effects for variables that dont. However, my panel data is imbalanced and Im not sure how that affects the estimates. Thanks!

                Code:
                 xthybrid ME r_pub_corr r_pub_corr2 l_stock l_acquisition l_emig l_asylum_application, cre vce(cluster ID) clusterid(ID) use(l_stock l_acquisition l_emig l_asylum_application)
                Code:
                Correlated random effects model. Family: gaussian. Link: identity.
                Code:
                +-----------------------------------+
                Variable    model    
                ----------------------+------------
                ME                               
                R__r_pub_corr     -0.5804 
                R__r_pub_corr2      1.1669 
                W__l_stock      0.5687 
                W__l_acquisition      0.1667 
                W__l_emig     -0.2148 
                W__l_asylum_applic~n     -0.0997 
                D__l_stock     -0.1918 
                D__l_acquisition     -0.0122 
                D__l_emig     -0.2848 
                D__l_asylum_applic~n      0.0764 
                _cons     -0.0717 
                ----------------------+------------
                var(_cons[ID])            
                _cons      1.3854 
                ----------------------+------------
                var(e.ME)            
                _cons      4.0641 
                ----------------------+------------
                Statistics                       
                ll  -2.271e+04 
                chi2    239.6209 
                p      0.0000 
                aic  45452.8758 
                bic  45546.9945 
                +-----------------------------------+
                Level 1: 10300 units. Level 2: 1521 units.

                Comment


                • #9
                  Arpan:
                  I think that the best advice is to point you out to https://journals.sagepub.com/doi/pdf...867X1701700106.
                  Kind regards,
                  Carlo
                  (Stata 19.0)

                  Comment

                  Working...
                  X