I examine the number of co-publications by six countries that are member of a regional organization over the last 31 years. I'm particularly interested in similarities and differences of the countries' co-authorship patterns, especially in those variables that reflect the relation to co-authors' countries such as the trade volume or the geographical distance. For this purpose, I've decided to apply a population-averaged negative binomial model using -xtnbreg, pa difficult vce(robust)- following the field-specific literature's recommendation for the case of overdispersed data.
I'm currently, however, at an impasse because including any of these "pair variables" (or virtual proximity variables) results in -no convergence-.
Now this issue is certainly not new and I have found helpful explanations and advice in previous forum threads [1,2,3,4,5] - but I'm not sure if I understand all of it properly and I'm a bit uncertain about which of them apply to my specific case. I have summarized potential issues, their recommended remedies and what I've tried so far to give you a better overview on my current understanding. Please indicate if you see something that I got wrong.
1. It may be the case that there is high collinearity between independent variables [5]. Some year dummies at the end of the time period are dropped due to collinearity but checking correlations with -pwcorr- shows relatively weak correlations between pair variables (<0.4) while they are higher between those variables that reflect the domestic dimension (where convergence is achieved). -no convergence- is also an issue if I exclude the year dummies so I don't think this should be an issue.
2. There is the possibility that the maximum likelihood estimator for my model "does not exist" for my data [1]. This seems to be a possible option as my data has indeed a large number of 0 values in the dependent variable and the "pair" independent variables. A potential remedy would be to start with a poisson regression and plugin the estimates into the negative binomial regression [2]. I wasn't sure what model I should use so I've just run a population-averaged poisson regression -xtpoisson, pa- but it results in -no convergence- as well. I assume using these estimates probably won't help neither, right?
3. In case of using an interaction, the model including the interaction may not be identified by the data [4]. I'm not using interactions so that specific issue should not apply here.
4. My model is insufficient and I should try something different.
I've tried to use -difficult- option to change the steps during the iteration [4], however, to no avail. I have tried to use -xtpoisson, r fe- as a fall back option [3] but with the same result. What I haven't thoroughly tried so far is to use another maximization technique as I lack proper understanding of the particularities of the different techniques.
I should mention that I have isues with an empty Wald chi² statistic that I attribute to a scaling problem as I was able to fix it by re-scaling the problematic pair variables.
Do you have some recommendations on possible next steps?
I've attached an example of the regression and copied the output below for a better overview.
[1] https://www.statalist.org/forums/for...binomial-model
[2] https://www.statalist.org/forums/for...sson-estimates
[3] https://www.statalist.org/forums/for...-fixed-effects
[4] https://www.statalist.org/forums/for...ial-regression
[5] https://www.stata.com/statalist/arch.../msg00288.html
I'm currently, however, at an impasse because including any of these "pair variables" (or virtual proximity variables) results in -no convergence-.
Now this issue is certainly not new and I have found helpful explanations and advice in previous forum threads [1,2,3,4,5] - but I'm not sure if I understand all of it properly and I'm a bit uncertain about which of them apply to my specific case. I have summarized potential issues, their recommended remedies and what I've tried so far to give you a better overview on my current understanding. Please indicate if you see something that I got wrong.
1. It may be the case that there is high collinearity between independent variables [5]. Some year dummies at the end of the time period are dropped due to collinearity but checking correlations with -pwcorr- shows relatively weak correlations between pair variables (<0.4) while they are higher between those variables that reflect the domestic dimension (where convergence is achieved). -no convergence- is also an issue if I exclude the year dummies so I don't think this should be an issue.
2. There is the possibility that the maximum likelihood estimator for my model "does not exist" for my data [1]. This seems to be a possible option as my data has indeed a large number of 0 values in the dependent variable and the "pair" independent variables. A potential remedy would be to start with a poisson regression and plugin the estimates into the negative binomial regression [2]. I wasn't sure what model I should use so I've just run a population-averaged poisson regression -xtpoisson, pa- but it results in -no convergence- as well. I assume using these estimates probably won't help neither, right?
3. In case of using an interaction, the model including the interaction may not be identified by the data [4]. I'm not using interactions so that specific issue should not apply here.
4. My model is insufficient and I should try something different.
I've tried to use -difficult- option to change the steps during the iteration [4], however, to no avail. I have tried to use -xtpoisson, r fe- as a fall back option [3] but with the same result. What I haven't thoroughly tried so far is to use another maximization technique as I lack proper understanding of the particularities of the different techniques.
I should mention that I have isues with an empty Wald chi² statistic that I attribute to a scaling problem as I was able to fix it by re-scaling the problematic pair variables.
Do you have some recommendations on possible next steps?
I've attached an example of the regression and copied the output below for a better overview.
Code:
. xtset target year panel variable: target (strongly balanced) time variable: year, 1988 to 2018 delta: 1 unit xtnbreg collab_weight rtot_trade gdp_pc tertenrol_epol trade_percgdp mobcell100 colotrad langcom i.year, pa difficult vce(robust) note: 2015.year omitted because of collinearity note: 2016.year omitted because of collinearity note: 2017.year omitted because of collinearity note: 2018.year omitted because of collinearity Iteration 1: tolerance = .31055186 Iteration 2: tolerance = .07928357 Iteration 3: tolerance = .08383659 Iteration 4: tolerance = .04340788 Iteration 5: tolerance = .22333106 . . . Iteration 95: tolerance = .23854644 Iteration 96: tolerance = .19215449 Iteration 97: tolerance = .23957639 Iteration 98: tolerance = .20948912 Iteration 99: tolerance = .26024878 Iteration 100: tolerance = .10269846 GEE population-averaged model Number of obs = 3,161 Group variable: target Number of groups = 109 Link: log Obs per group: Family: negative binomial(k=1) min = 29 Correlation: exchangeable avg = 29.0 max = 29 Wald chi2(31) = 10794.89 Scale parameter: 1 Prob > chi2 = 0.0000 (Std. Err. adjusted for clustering on target) -------------------------------------------------------------------------------- | Semirobust collab_weight | Coef. Std. Err. z P>|z| [95% Conf. Interval] ---------------+---------------------------------------------------------------- rtot_trade | 2.60e-09 6.37e-09 0.41 0.683 -9.88e-09 1.51e-08 gdp_pc | .0000971 .0000247 3.94 0.000 .0000487 .0001454 tertenrol_epol | 2.22e-06 4.01e-07 5.53 0.000 1.43e-06 3.01e-06 trade_percgdp | -.0183561 .0116593 -1.57 0.115 -.0412079 .0044958 mobcell100 | .0073132 .0011522 6.35 0.000 .0050549 .0095715 colotrad | 1.594459 .5851681 2.72 0.006 .4475504 2.741367 langcom | 1.509419 .6002954 2.51 0.012 .3328614 2.685976 | year | 1991 | .2856466 .3194495 0.89 0.371 -.3404629 .9117561 1992 | -.0231025 .2237049 -0.10 0.918 -.461556 .415351 1993 | .4922213 .1874601 2.63 0.009 .1248063 .8596362 1994 | .4741249 .2524005 1.88 0.060 -.020571 .9688208 1995 | .4691006 .2529398 1.85 0.064 -.0266523 .9648535 1996 | .4894541 .1945354 2.52 0.012 .1081717 .8707365 1997 | .927893 .2780302 3.34 0.001 .3829638 1.472822 1998 | 1.250811 .2103598 5.95 0.000 .8385137 1.663109 1999 | 1.082485 .2243396 4.83 0.000 .6427869 1.522182 2000 | .96179 .2667751 3.61 0.000 .4389204 1.48466 2001 | .9312558 .1905119 4.89 0.000 .5578593 1.304652 2002 | .8492289 .1903188 4.46 0.000 .4762109 1.222247 2003 | .9112329 .2544881 3.58 0.000 .4124454 1.41002 2004 | .6635822 .2677673 2.48 0.013 .138768 1.188396 2005 | .2132622 .2759962 0.77 0.440 -.3276804 .7542048 2006 | .2055204 .3177095 0.65 0.518 -.4171788 .8282196 2007 | .0070632 .3180742 0.02 0.982 -.6163507 .6304771 2008 | -.4258888 .2031861 -2.10 0.036 -.8241263 -.0276514 2009 | -.1525135 .2025328 -0.75 0.451 -.5494704 .2444435 2010 | -.2249248 .181555 -1.24 0.215 -.580766 .1309164 2011 | -.2851735 .1338686 -2.13 0.033 -.5475511 -.0227958 2012 | -.54935 .0686513 -8.00 0.000 -.6839041 -.4147959 2013 | -.6197236 .0624535 -9.92 0.000 -.7421302 -.497317 2014 | -.4995681 .0395978 -12.62 0.000 -.5771784 -.4219578 2015 | 0 (omitted) 2016 | 0 (omitted) 2017 | 0 (omitted) 2018 | 0 (omitted) | _cons | -.7398958 .8294814 -0.89 0.372 -2.365649 .8858579 -------------------------------------------------------------------------------- convergence not achieved r(430);
[1] https://www.statalist.org/forums/for...binomial-model
[2] https://www.statalist.org/forums/for...sson-estimates
[3] https://www.statalist.org/forums/for...-fixed-effects
[4] https://www.statalist.org/forums/for...ial-regression
[5] https://www.stata.com/statalist/arch.../msg00288.html
Comment