Isolate Language effect - Gravity

Pericles Sa Nogueira

Join Date: Aug 2024

Posts: 6
#1

Isolate Language effect - Gravity

14 Apr 2025, 21:51

Hello Stata Forum,

I am currently working on my thesis, trying to understand the effect of common language on intra-trade in Africa. Literature on trade has shown that sharing a common has a positive impact on trade (e.g https://www.sciencedirect.com/scienc...543?via%3Dihub).

1 - PPMLHDFE has been used to estimate the gravity model:

ppmlhdfe tradeflow_baci fta_wto ln_dist contig cplp_imp_exp cw_imp_exp oif_imp_exp a_league_imp_exp waemu_imp_exp cemac_imp_exp cma_imp_exp comrelig, a(exp_year imp_year) cluster (pair_id) nolog

I created dummies cplp_imp_exp (=1 if importer and exporter share the portuguese as common language), cw_imp_exp (=1 if importer and exporter share the english as common language), oif_imp_exp(=1 if imp exporter share french), a_league_imp_exp (=1 if arab is the common language).

I controlled for culture/religion, monetary union and common currency.

2 - The results of the estimates below shows:
cw_imp_exp and a_league_imp_exp are significant with positive coefficient. I am controlling for Importer and exporter fixed effects (not using country pair FE due to colinearity and dropped variables) and also removed RTA from regression 4 (column 4) as it could be inflated and impact the other variables. Joao Santos Silva Thank you.

oif_imp_exp significant with negative coefficient.

Questions:

A - Is there anything else I could do to make sure I am capturing the effect of language and not something else? Any idea is welcome.
B - Showld I be worried with any other sort of Endogeneity as I am not using the the invariant FE (country pair FE)?

Thank you
Tags: None
Pericles Sa Nogueira

Join Date: Aug 2024

Posts: 6
#2

28 Apr 2025, 21:49

Hello Professor, Joao Santos Silva . I run another regression using, a two-stage estimation strategy, while using pair_id (to handle endogeneity). As language community, such as CW (commonwealth) is time invariant among pairs. I performed a second stage to check if (exporters and importers) being part of CW is significant .

.1 - *First stage estimation
.
. ppmlhdfe tradeflow_baci fta_wto ln_dist contig cplp_imp_exp cw_imp_exp oif_imp_exp a_league_imp_exp waemu_imp_exp cemac_imp_exp cma
> _imp_exp comrelig, a(exp_year imp_year pair_id, save) cluster (pair_id) nolog
(dropped 274 observations that are either singletons or separated by a fixed effect)
warning: dependent variable takes very low values after standardizing (5.0741e-09)
note: 8 variables omitted because of collinearity: cplp_imp_exp cw_imp_exp oif_imp_exp a_league_imp_exp waemu_imp_exp cemac_imp_exp c
> ma_imp_exp comrelig
note: ln_dist is probably collinear with the fixed effects (all partialled-out values are close to zero; tol = 1.0e-06)
Converged in 13 iterations and 68 HDFE sub-iterations (tol = 1.0e-08)

HDFE PPML regression No. of obs = 9,240
Absorbing 3 HDFE groups Residual df = 2,087
Statistics robust to heteroskedasticity Wald chi2(2) = 0.69
Deviance = 57506731.37 Prob > chi2 = 0.7087
Log pseudolikelihood = -28789716.99 Pseudo R2 = 0.9660

Number of clusters (pair_id)= 2,088
(Std. err. adjusted for 2,088 clusters in pair_id)
----------------------------------------------------------------------------------
| Robust
tradeflow_baci | Coefficient std. err. z P>|z| [95% conf. interval]
-----------------+----------------------------------------------------------------
fta_wto | -.1793716 .2162601 -0.83 0.407 -.6032336 .2444904
ln_dist | 0 (omitted)
contig | .0009271 .1810112 0.01 0.996 -.3538484 .3557025
cplp_imp_exp | 0 (omitted)
cw_imp_exp | 0 (omitted)
oif_imp_exp | 0 (omitted)
a_league_imp_exp | 0 (omitted)
waemu_imp_exp | 0 (omitted)
cemac_imp_exp | 0 (omitted)
cma_imp_exp | 0 (omitted)
comrelig | 0 (omitted)
_cons | 13.12717 .1750179 75.00 0.000 12.78414 13.4702
----------------------------------------------------------------------------------

Absorbed degrees of freedom:
-----------------------------------------------------+
Absorbed FE | Categories - Redundant = Num. Coefs |
-------------+---------------------------------------|
exp_year | 307 1 306 |
imp_year | 308 6 302 |
pair_id | 2088 2088 0 *|
-----------------------------------------------------+
* = FE nested within cluster; treated as redundant for DoF computation

. rename __hdfe3__ pair_fixed

.
2 - second stage estimation:
.
. reghdfe pair_fixed cw_imp_exp, a(exp_year imp_year) cluster(pair_id)
(MWFE estimator converged in 8 iterations)

HDFE Linear regression Number of obs = 9,240
Absorbing 2 HDFE groups F( 1, 2087) = 21.72
Statistics robust to heteroskedasticity Prob > F = 0.0000
R-squared = 0.3207
Adj R-squared = 0.2727
Within R-sq. = 0.0117
Number of clusters (pair_id) = 2,088 Root MSE = 2.3306

(Std. err. adjusted for 2,088 clusters in pair_id)
------------------------------------------------------------------------------
| Robust
pair_fixed | Coefficient std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
cw_imp_exp | 1.194177 .2562609 4.66 0.000 .6916229 1.69673
_cons | -3.604822 .0573032 -62.91 0.000 -3.7172 -3.492445
------------------------------------------------------------------------------

Absorbed degrees of freedom:
-----------------------------------------------------+
Absorbed FE | Categories - Redundant = Num. Coefs |
-------------+---------------------------------------|
exp_year | 307 0 307 |
imp_year | 308 6 302 |
-----------------------------------------------------+

so, my variable cw_imp_exp is significant, so language play an important role. Anything else to check for any endogeneity?

Regards
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3025
#3

29 Apr 2025, 04:48

Dear Pericles Sa Nogueira,

I am not sure about the validity of such approach. I suspect that at least the standard errors would have to be corrected.

Best wishes,

Joao
Comment

Announcement

Isolate Language effect - Gravity

Comment

Comment