Number of zero and ppmlhdfe

Koko DIBLONI

Join Date: Jul 2025

Posts: 12
#1

Number of zero and ppmlhdfe

13 Aug 2025, 06:13

Hello everyone, I’m estimating the effects of fiscal consolidation episodes, measured as a percentage of GDP, on bilateral inward flows to developing economies. My dependent variable contains approximately 90% zeros in the sample.
When I run the regression excluding the zeros, I obtain a statistically significant result at the 10% level with a positive sign. However, when I include the zeros, the significance disappears, the coefficient sign turns negative, and the p-value becomes very high. I’m using fixed effects for origin-year, country pairs. I cluster at the destination-year level because clustering at the country-pair level prevents coefficient estimation.
I would like to understand what might be causing these discrepancies.

. ppmlhdfe in_Flow_per_r Fisc_r $control_jt_MR $control_ijt, vce(cl da) absorb(fepr fpt) nolog d keepsingletons sep(fe)
warning: keeping singleton groups will keep fixed effects that cause separation
warning: dependent variable takes very low values after standardizing (9.6738e-12)
Converged in 20 iterations and 97 HDFE sub-iterations (tol = 1.0e-08)

HDFE PPML regression No. of obs = 87,087
Absorbing 2 HDFE groups Residual df = 956
Statistics robust to heteroskedasticity Wald chi2(15) = 84.27
Deviance = 1268.88905 Prob > chi2 = 0.0000
Log pseudolikelihood = -1708.470875 Pseudo R2 = 0.7994

Number of clusters (da) = 957
(Std. err. adjusted for 957 clusters in da)
-------------------------------------------------------------------------------
| Robust
in_Flow_per_r | Coefficient std. err. z P>|z| [95% conf. interval]
--------------+----------------------------------------------------------------
Fisc_r | -.0108548 .0337795 -0.32 0.748 -.0770614 .0553519
fin_dev_r | -.0030926 .0116539 -0.27 0.791 -.0259337 .0197486
inflation_r | .0218521 .0184559 1.18 0.236 -.0143209 .058025
access_elec_r | -.0365178 .0113473 -3.22 0.001 -.0587581 -.0142774
res_rents_r | -.0474482 .0127503 -3.72 0.000 -.0724383 -.0224582
gross_debt_r | -.0113401 .0057079 -1.99 0.047 -.0225274 -.0001529
gdp_growth_r | .0699072 .0186284 3.75 0.000 .0333963 .1064182
remit_gdp_r | .0520429 .0353487 1.47 0.141 -.0172392 .121325
log_GDP_r | 2.392849 1.144094 2.09 0.036 .1504654 4.635233
Inst_qlt | -.9502582 .4980707 -1.91 0.056 -1.926459 .0259424
CIT_r | .0392391 .0214041 1.83 0.067 -.0027121 .0811903
MR | 3.948637 1.475418 2.68 0.007 1.056872 6.840402
BIT | .707005 .3555214 1.99 0.047 .010196 1.403814
RTA | .5260533 .3545719 1.48 0.138 -.1688948 1.221001
InstDist | -.2619662 .3371992 -0.78 0.437 -.9228645 .3989322
_cons | -89.65802 30.7708 -2.91 0.004 -149.9677 -29.34835
-------------------------------------------------------------------------------

Absorbed degrees of freedom:
-----------------------------------------------------+
Absorbed FE | Categories - Redundant = Num. Coefs |
-------------+---------------------------------------|
fepr | 8281 0 8281 |
fpt | 1196 92 1104 |
-----------------------------------------------------+
Tags: None
Joao Santos Silva

Join Date: Apr 2014

Posts: 3021
#2

13 Aug 2025, 09:02

Dear Koko DIBLONI,

There is something strange about what is going on here. First, you say that you are including origin-year and country pairs fixed effects. Why don't you also include destination-year fixed effects? Is it because you have a single destination? Also, if you are including pair fixed effects, how come distance does not drop? Also, do not use the keepsingletons and sep(fe) options. Finally, what do you mean when you say that you cluster at the destination-year level because clustering at the country-pair level prevents coefficient estimation?

Best wishes,

Joao
Comment
Koko DIBLONI

Join Date: Jul 2025

Posts: 12
#3

14 Aug 2025, 03:49

Dear Joao Santos Silva , Thank you for your assistance.

Regarding your first question, I do not use destination-year fixed effects because my variable of interest is defined at the destination-year level. Including such fixed effects would absorb the variation I aim to analyze.

Secondly, the variable in question is not geographical distance but rather institutional distance, which varies over time.

Also, when I do not include the options keepsingletons and sep(fe) in my regression, my sample size drops dramatically from 126,126 to 17,839 observations. Is this reduction normal?

Finally, when I cluster at the country-pair level, I notice that in the small table at the bottom (Absorbed degrees of freedom), the number of coefficients (num.coef) appears as 0*. Could you clarify what this means?

1) regression 1: clustering in country pair level

. ppmlhdfe in_Flow_per_r Fisc_r $control_jt $control_ijt MR , vce(cl fepr) absorb(fepr fpt) nolog
(dropped 69248 observations that are either singletons or separated by a fixed effect)
warning: dependent variable takes very low values after standardizing (4.3809e-12)
$$ Stopping (no negative residuals); separation found in 0 observations (1 iterations and 25 subiterations)
Converged in 16 iterations and 81 HDFE sub-iterations (tol = 1.0e-08)

HDFE PPML regression No. of obs = 17,839
Absorbing 2 HDFE groups Residual df = 1,722
Statistics robust to heteroskedasticity Wald chi2(15) = 51.09
Deviance = 1268.889047 Prob > chi2 = 0.0000
Log pseudolikelihood = -1708.470874 Pseudo R2 = 0.7356

Number of clusters (fepr) = 1,723
(Std. err. adjusted for 1,723 clusters in fepr)
-------------------------------------------------------------------------------
| Robust
in_Flow_per_r | Coefficient std. err. z P>|z| [95% conf. interval]
--------------+----------------------------------------------------------------
Fisc_r | -.0108548 .034523 -0.31 0.753 -.0785187 .0568092
fin_dev_r | -.0030926 .0138351 -0.22 0.823 -.0302089 .0240237
inflation_r | .0218521 .021187 1.03 0.302 -.0196737 .0633778
access_elec_r | -.0365178 .0145321 -2.51 0.012 -.0650001 -.0080354
res_rents_r | -.0474482 .0196069 -2.42 0.016 -.085877 -.0090195
gross_debt_r | -.0113401 .0078073 -1.45 0.146 -.0266422 .0039619
gdp_growth_r | .0699072 .0262637 2.66 0.008 .0184314 .121383
remit_gdp_r | .0520429 .0445919 1.17 0.243 -.0353557 .1394415
log_GDP_r | 2.392849 1.270034 1.88 0.060 -.0963719 4.88207
Inst_qlt | -.9502582 .6139648 -1.55 0.122 -2.153607 .2530908
CIT_r | .0392391 .0236642 1.66 0.097 -.007142 .0856201
BIT | .707005 .3951512 1.79 0.074 -.067477 1.481487
RTA | .5260533 .3956898 1.33 0.184 -.2494845 1.301591
InstDist | -.2619662 .4400135 -0.60 0.552 -1.124377 .6004444
MR | 3.948637 2.22782 1.77 0.076 -.4178101 8.315084
_cons | -89.65802 36.30691 -2.47 0.014 -160.8183 -18.49779
-------------------------------------------------------------------------------

Absorbed degrees of freedom:
-----------------------------------------------------+
Absorbed FE | Categories - Redundant = Num. Coefs |
-------------+---------------------------------------|
fepr | 1723 1723 0 *|
fpt | 1025 1 1024 |
-----------------------------------------------------+

2) regression 2: clustering in destination year level

. ppmlhdfe in_Flow_per_r Fisc_r $control_jt $control_ijt MR , vce(cl da) absorb(fepr fpt) nolog
(dropped 69248 observations that are either singletons or separated by a fixed effect)
warning: dependent variable takes very low values after standardizing (4.3809e-12)
$$ Stopping (no negative residuals); separation found in 0 observations (1 iterations and 25 subiterations)
Converged in 16 iterations and 81 HDFE sub-iterations (tol = 1.0e-08)

HDFE PPML regression No. of obs = 17,839
Absorbing 2 HDFE groups Residual df = 956
Statistics robust to heteroskedasticity Wald chi2(15) = 84.27
Deviance = 1268.889047 Prob > chi2 = 0.0000
Log pseudolikelihood = -1708.470874 Pseudo R2 = 0.7356

Number of clusters (da) = 957
(Std. err. adjusted for 957 clusters in da)
-------------------------------------------------------------------------------
| Robust
in_Flow_per_r | Coefficient std. err. z P>|z| [95% conf. interval]
--------------+----------------------------------------------------------------
Fisc_r | -.0108548 .0337795 -0.32 0.748 -.0770614 .0553519
fin_dev_r | -.0030926 .0116539 -0.27 0.791 -.0259337 .0197486
inflation_r | .0218521 .0184559 1.18 0.236 -.0143209 .058025
access_elec_r | -.0365178 .0113473 -3.22 0.001 -.0587581 -.0142774
res_rents_r | -.0474482 .0127503 -3.72 0.000 -.0724383 -.0224582
gross_debt_r | -.0113401 .0057079 -1.99 0.047 -.0225274 -.0001529
gdp_growth_r | .0699072 .0186284 3.75 0.000 .0333963 .1064182
remit_gdp_r | .0520429 .0353487 1.47 0.141 -.0172392 .121325
log_GDP_r | 2.392849 1.144094 2.09 0.036 .1504654 4.635233
Inst_qlt | -.9502582 .4980707 -1.91 0.056 -1.926459 .0259424
CIT_r | .0392391 .0214041 1.83 0.067 -.0027121 .0811903
BIT | .707005 .3555214 1.99 0.047 .010196 1.403814
RTA | .5260533 .3545719 1.48 0.138 -.1688948 1.221001
InstDist | -.2619662 .3371992 -0.78 0.437 -.9228645 .3989322
MR | 3.948637 1.475418 2.68 0.007 1.056872 6.840402
_cons | -89.65802 30.7708 -2.91 0.004 -149.9677 -29.34835
-------------------------------------------------------------------------------

Absorbed degrees of freedom:
-----------------------------------------------------+
Absorbed FE | Categories - Redundant = Num. Coefs |
-------------+---------------------------------------|
fepr | 1723 0 1723 |
fpt | 1025 92 933 |
-----------------------------------------------------+
Comment
Koko DIBLONI

Join Date: Jul 2025

Posts: 12
#4

14 Aug 2025, 05:18

The first table did not display properly, so I’m reposting it here

. ppmlhdfe in_Flow_per_r Fisc_r $control_jt $control_ijt MR , vce(cl fepr) absorb(fepr fpt) nolog
(dropped 69248 observations that are either singletons or separated by a fixed effect)
warning: dependent variable takes very low values after standardizing (4.3809e-12)
$$ Stopping (no negative residuals); separation found in 0 observations (1 iterations and 25 subiterations)
Converged in 16 iterations and 81 HDFE sub-iterations (tol = 1.0e-08)

HDFE PPML regression No. of obs = 17,839
Absorbing 2 HDFE groups Residual df = 1,722
Statistics robust to heteroskedasticity Wald chi2(15) = 51.09
Deviance = 1268.889047 Prob > chi2 = 0.0000
Log pseudolikelihood = -1708.470874 Pseudo R2 = 0.7356

Number of clusters (fepr) = 1,723
(Std. err. adjusted for 1,723 clusters in fepr)

Robust
in_Flow_per_r Coefficient std. err. z P>z [95% conf. interval]

Fisc_r -.0108548 .034523 -0.31 0.753 -.0785187 .0568092
fin_dev_r -.0030926 .0138351 -0.22 0.823 -.0302089 .0240237
inflation_r .0218521 .021187 1.03 0.302 -.0196737 .0633778
access_elec_r -.0365178 .0145321 -2.51 0.012 -.0650001 -.0080354
res_rents_r -.0474482 .0196069 -2.42 0.016 -.085877 -.0090195
gross_debt_r -.0113401 .0078073 -1.45 0.146 -.0266422 .0039619
gdp_growth_r .0699072 .0262637 2.66 0.008 .0184314 .121383
remit_gdp_r .0520429 .0445919 1.17 0.243 -.0353557 .1394415
log_GDP_r 2.392849 1.270034 1.88 0.060 -.0963719 4.88207
Inst_qlt -.9502582 .6139648 -1.55 0.122 -2.153607 .2530908
CIT_r .0392391 .0236642 1.66 0.097 -.007142 .0856201
BIT .707005 .3951512 1.79 0.074 -.067477 1.481487
RTA .5260533 .3956898 1.33 0.184 -.2494845 1.301591
InstDist -.2619662 .4400135 -0.60 0.552 -1.124377 .6004444
MR 3.948637 2.22782 1.77 0.076 -.4178101 8.315084
_cons -89.65802 36.30691 -2.47 0.014 -160.8183 -18.49779

Absorbed degrees of freedom:

Absorbed FE Categories - Redundant = Num. Coefs
-
fepr 1723 1723 0 *
fpt 1025 1 1024

* = FE nested within cluster; treated as redundant for DoF computation
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3021
#5

15 Aug 2025, 22:39

Dear Koko DIBLONI,

Thanks for the clarification.

1. That is a problem. A "structural" gravity equation needs both importer-year and exporter-year fixed effects. If you cannot use both, you need to be very careful in interpreting your results.
2. Ah! OK.
3. That is normal; the observations that are dropped do not contribute to the estimates (notice that the estimates are the same in both cases). However, keeping the singletons makes the standard errors unreliable.
4. You can ignore that (there is a note at the bottom explaining it) and cluster by pair, or distance (distance may be better).

Best wishes,

Joao
Comment
Koko DIBLONI

Join Date: Jul 2025

Posts: 12
#6

16 Aug 2025, 07:01

Dear Joao Santos Silva

Thank you very much for your helpful insights and guidance. I truly appreciate your support.

Best regards,
Koko
1 like
Comment
Koko DIBLONI

Join Date: Jul 2025

Posts: 12
#7

17 Aug 2025, 15:02

Dear Joao Santos Silva

I would be very grateful if you could kindly explain why clustering at the distance level is considered preferable.

Additionally, since I am not including all fixed effects in my model, does that mean I cannot identify the causal effect?

These clarifications would be extremely helpful to me.

Best regards,
Koko
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3021
#8

18 Aug 2025, 03:41

Dear Koko DIBLONI,

If you cluster by distance, pairs (A,B) and (B,A) will be in the same cluster, as they should. If you cluster by cluster-pair, they will be in different clusters.
If you do not include all the fixed effects, you cannot call your estimates "structural"; I would never call them causal.

Best wishes,

Joao
Comment
Koko DIBLONI

Join Date: Jul 2025

Posts: 12
#9

19 Aug 2025, 00:57

Dear Joao Santos Silva ,

Thank you,

Best regards,
Koko
1 like
Comment

Announcement

Number of zero and ppmlhdfe

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment