Help with panel models that have dependent variables that do not vary over the same unit

Arpan Sagar

Join Date: May 2022

Posts: 10
#1

Help with panel models that have dependent variables that do not vary over the same unit

11 Nov 2022, 08:10

Hi everyone!

Im trying to estimate the determinants of measurement errors in migration flows and my equation looks something like this:

Where ME represents the degree of inaccuracy and varies over country of destination (i), origin (j) and time t. While my demographic variables also vary over country pair and time, the variables that capture institutional quality and socio-political stability vary over either country of destination or origin (Corruption in destination and Political Stability in country of origin).

I initially applied time and country pair fixed effects using reghdfe which yields the following result:

(Apologies for the inconsistent naming of the variables r=country of residence and o=origin)
However, I wonder if the approach taken for fixed effects is correct based on the information provided above? What are the disadvantages to adopting a country pair fixed effects for dependent variables that do not vary over country pair? Any help would be appreciated!

Attached Files
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17702
#2

11 Nov 2022, 16:21

Arpan:
the main issue with the -fe- estimator is to have time-varying variables that show within-panel variation.
From your post:
1) unless you want to store both -panelid- and -timevar- fixed effect (the community-contributed module -reghdfe- has an option for that, the so called two-way fixed effect can be coded with -xtreg,fe-, too;
2) while the within-Rsq looks good, you do not report on:
a) -fe- being the way to go with your dataset;
b) previous tests on the correct specification of the functional form of the regressand.

Kind regards,
Carlo
(Stata 19.0)
Comment
Arpan Sagar

Join Date: May 2022

Posts: 10
#3

12 Nov 2022, 02:49

Hi Carlo!
Thank you for your comment! Regarding point 1, I opt of reghdfe purely out of convenience. As for point two, I run the standard xtoverid the results of which are as follows:

(I also run the standard hausman test with fe and re estimates and reject the null hypothesis in favor of the FE model) Additionally, regarding 2b, I haven't checked for the specification of the functional form (I'm assuming that you mean checking for omitted variable bias), however, I do check for heteroskedasticity using hettest and conclude that using robust standard errors is the more sound approach.
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17702

12 Nov 2022, 03:26

Arpan:
1) -xtoverid- outcome points you out to the -fe- specification;
2) with such a large number of panels you should go cluster-robust standard errors to take both heteroskedasticity and autocorrelation of the epsilon into account;
3) as far as checking the right secification of the functional form of the regressand, I meant something similar to -linktest- that should be replicated by hand as it cannot be called after -xtreg-. This procedure, that share some features of the -estat ovtest- is reported in the following toy-example:

Code:

. use "https://www.stata-press.com/data/r17/nlswork.dta"
(National Longitudinal Survey of Young Women, 14-24 years old in 1968)

. xtreg ln_wage c.age##c.age, fe vce(cluster idcode)

Fixed-effects (within) regression               Number of obs     =     28,510
Group variable: idcode                          Number of groups  =      4,710

R-squared:                                      Obs per group:
     Within  = 0.1087                                         min =          1
     Between = 0.1006                                         avg =        6.1
     Overall = 0.0865                                         max =         15

                                                F(2,4709)         =     507.42
corr(u_i, Xb) = 0.0440                          Prob > F          =     0.0000

                             (Std. err. adjusted for 4,710 clusters in idcode)
------------------------------------------------------------------------------
             |               Robust
     ln_wage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
         age |   .0539076    .004307    12.52   0.000     .0454638    .0623515
             |
 c.age#c.age |  -.0005973    .000072    -8.30   0.000    -.0007384   -.0004562
             |
       _cons |    .639913   .0624195    10.25   0.000     .5175415    .7622845
-------------+----------------------------------------------------------------
     sigma_u |   .4039153
     sigma_e |  .30245467
         rho |  .64073314   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. predict fitted, xb
(24 missing values generated)

. g sq_fitted=fitted^2
(24 missing values generated)

. xtreg ln_wage fitted sq_fitted , fe vce(cluster idcode)

Fixed-effects (within) regression               Number of obs     =     28,510
Group variable: idcode                          Number of groups  =      4,710

R-squared:                                      Obs per group:
     Within  = 0.1092                                         min =          1
     Between = 0.1033                                         avg =        6.1
     Overall = 0.0881                                         max =         15

                                                F(2,4709)         =     523.09
corr(u_i, Xb) = 0.0467                          Prob > F          =     0.0000

                             (Std. err. adjusted for 4,710 clusters in idcode)
------------------------------------------------------------------------------
             |               Robust
     ln_wage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
      fitted |   2.569185   .7085064     3.63   0.000     1.180181    3.958189
   sq_fitted |    -.47432   .2153021    -2.20   0.028    -.8964128   -.0522272
       _cons |  -1.290258    .580562    -2.22   0.026    -2.428431   -.1520844
-------------+----------------------------------------------------------------
     sigma_u |    .403403
     sigma_e |  .30238578
         rho |  .64025357   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. test sq_fitted

 ( 1)  sq_fitted = 0

       F(  1,  4709) =    4.85
            Prob > F =    0.0276

.

As -test- outcome rejects the null, the model is misspecified.

Kind regards,
Carlo
(Stata 19.0)

Comment

Arpan Sagar

Join Date: May 2022

Posts: 10
#5

12 Nov 2022, 03:56

Thanks for the help! I did as you indicated and as I understand based on the screenshot below, my model has an issue of misspecification.

Attached Files
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17702
#6

12 Nov 2022, 07:42

Arpan:
therefore you have to include more predictors and/or interactions between them.
In addition, as per FAQ please do not post screenshots but use CODE delimiters to share what you typed and what Stata gave you back Thanks.

Last edited by Carlo Lazzaro; 12 Nov 2022, 07:44.

Kind regards,
Carlo
(Stata 19.0)
Comment
Arpan Sagar

Join Date: May 2022

Posts: 10
#7

12 Nov 2022, 08:19

Thanks Carlo! I shall keep that in mind.
Comment

Arpan Sagar

Join Date: May 2022
Posts: 10

17 Nov 2022, 10:54

Carlo Lazzaro apologies to bring this subject back up. however, I wonder what are the issues associated with including variables that do not vary within panel (such as population of the country of residence being added to a migration model which varies over country pair and year). I have fixed my issues of misspecification by squared values of the corruption and stability indicators as mentioned in the first equation above but I wonder if using the correlated random effects approach might be more appropriate in this context as I could obtain within estimates of variables that vary over the cluster ID and get random effects for variables that dont. However, my panel data is imbalanced and Im not sure how that affects the estimates. Thanks!

Code:

 xthybrid ME r_pub_corr r_pub_corr2 l_stock l_acquisition l_emig l_asylum_application, cre vce(cluster ID) clusterid(ID) use(l_stock l_acquisition l_emig l_asylum_application)

Code:

Correlated random effects model. Family: gaussian. Link: identity.

Code:

+-----------------------------------+
Variable    model    
----------------------+------------
ME                               
R__r_pub_corr     -0.5804 
R__r_pub_corr2      1.1669 
W__l_stock      0.5687 
W__l_acquisition      0.1667 
W__l_emig     -0.2148 
W__l_asylum_applic~n     -0.0997 
D__l_stock     -0.1918 
D__l_acquisition     -0.0122 
D__l_emig     -0.2848 
D__l_asylum_applic~n      0.0764 
_cons     -0.0717 
----------------------+------------
var(_cons[ID])            
_cons      1.3854 
----------------------+------------
var(e.ME)            
_cons      4.0641 
----------------------+------------
Statistics                       
ll  -2.271e+04 
chi2    239.6209 
p      0.0000 
aic  45452.8758 
bic  45546.9945 
+-----------------------------------+
Level 1: 10300 units. Level 2: 1521 units.

Comment

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17702
#9

17 Nov 2022, 11:56

Arpan:
I think that the best advice is to point you out to https://journals.sagepub.com/doi/pdf...867X1701700106.

Kind regards,
Carlo
(Stata 19.0)
Comment

Announcement