ppmlhdfe in panels: predicted values and the adding up problem

Daniel Prosi

Join Date: Dec 2019

Posts: 17
#1

ppmlhdfe in panels: predicted values and the adding up problem

09 Jul 2020, 08:19

Dear everyone,

I am trying to obtain predicted values for a trade gravity equation from a panel of bilateral trade relations. For my application it is important that the sum of overall trade in the predictions corresponds exactly to the overall sum of trade in the original data.
I consulted Arvis, J-F. and Shepherd, B. (2013) The Poisson quasi-maximum likelihood estimator: a solution to the ‘adding up’ problem in gravity models. Applied Economics Letters. (link below) They find that PPML does in fact preserve the overall sum of trade and is furthermore the only estimator to do so.

I am using ppmlhdfe to estimate the gravity equation. My code is:
ppmlhdfe tradevalue ln_dist_air $dist_inds, abs(i.importercode#i.year i.exportercode#i.year) d vce(robust)
where tradevalue is levels of bilateral trade, ln_dist_air is log of great-circle distanceand $dist_inds area number of distance indicator variables. I include importer-year and exporter-year effects to account for multilateral resistance and country sizes in each period. I have several questions:
Will the predicted values obtained from running predict, xb be predicted trade in logs or in levels? From the lin-log specification of PPML generally I would expect them to be in levels.

The sums of my predicted results (both for logs and levels) overall, by year as well as by year and exporter are far from their corresponding sums in the observed data. That is although from eq(10) or eq(11) in Arvis and Shepherd (2013) (i.e. the FOC for the log-likelihood in PPML) it becomes clear that including year-importer fixed effects should render these sums equal for each exporter in each year. The total sum of all predicted to all observed trade should also be equal.

Following my estimation I run:

predict pr_grav, xb
egen check = total(pr_grav)
egen check2 = total(tradevalue)
gen check3 = check / check2
For the ratio of the sums Check2 I obtain a value of 0.039, where I would expect it to be precisely 1. That implies that overall trade is smaller than observed trade by a factor of 25.

Similarly, for the sum of importer-year trade I run
egen check_y1 = total(pr_grav), by(year importer)
egen check_y2 = total(tradevalue), by(year importer)
gen check_y3 = check_y1 / check_y2
Again, I obtain arbitrary values, most of which range from 0.02 to 56 (implying that the sum of predicted values are off by up to a factor 56).
My core question now is: Does absorbing the fixed effects in ppmlhdfe render the result by Arvis and Shepherd invalid in some way or am I making a grave mistake?

Thank you for your help

https://www.tandfonline.com

Last edited by Daniel Prosi; 09 Jul 2020, 08:25.
Tags: adding up problem in grav, Gravity, PPML, ppmlhdfe
Daniel Prosi

Join Date: Dec 2019

Posts: 17
#2

09 Jul 2020, 09:58

Additional note: Running the simple ppml command for one year of the data and then computing the above ratios returns the expected ratios of 1:
Namely running

ppml tradevalue ln_dist_air $dist_inds _IM* _EX* if year == 2010

yields the expected ratio of 1. _IM* and _EX* are importer end exporter dummies generated via the xi command.
(I noted that predicted values from ppml return tradevalue in logs, so I had to take the exponential. It seemed to me that ppmlhdfe predicts level values, however. This doesn't make much sense for me but I accpet it for the moment. I did all calculations with predicted and exponentials of predicted values).

Running the same analysis with ppmlhdfe generated the puzzling results that are different from 1.

ppmlhdfe tradevalue ln_dist_air $dist_inds if year == 2010, abs(importeriso exporteriso) d
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3015
#3

09 Jul 2020, 10:50

Dear Daniel Prosi,

Note that PPML does not return trade values in logs, but allows you to predict either the expectation of the dependent variable or the linear index (which you call trade in logs). However, I believe that by default it actually predicts the expected value of trade and the sum of that equals the sum of trade.

For ppmlhdfe you probably need to safe the fixed effects and incorporate them in the predictions; please check the help file.

Best wishes,

Joao
1 like
Comment
Daniel Prosi

Join Date: Dec 2019

Posts: 17
#4

09 Jul 2020, 15:28

Thank you Joao Santos Silva , that actually makes a lot of sense. Sorry for confusing terminology about predictions. Adding the saved sum of fixed effects resolves the problem.

I hope that I am right to assume that for the ppmlhdfe model above the correct interpretation of the predicted value + the sum of fixed effects variable would be the RHS of the gravity equation in logs. So taking the exponential of that should be the best model prediction for any bilateral trade flow (of course this is an expected value, as we obtain a fully connected network of trade where the real network is sparse).
Comment
Tom Zylkin

Join Date: Nov 2016

Posts: 188
#5

09 Jul 2020, 17:41

Originally posted by Daniel Prosi View Post

Dear everyone,

I am trying to obtain predicted values for a trade gravity equation from a panel of bilateral trade relations. For my application it is important that the sum of overall trade in the predictions corresponds exactly to the overall sum of trade in the original data.
I consulted Arvis, J-F. and Shepherd, B. (2013) The Poisson quasi-maximum likelihood estimator: a solution to the ‘adding up’ problem in gravity models. Applied Economics Letters. (link below) They find that PPML does in fact preserve the overall sum of trade and is furthermore the only estimator to do so.

I am using ppmlhdfe to estimate the gravity equation. My code is:
ppmlhdfe tradevalue ln_dist_air $dist_inds, abs(i.importercode#i.year i.exportercode#i.year) d vce(robust)
where tradevalue is levels of bilateral trade, ln_dist_air is log of great-circle distanceand $dist_inds area number of distance indicator variables. I include importer-year and exporter-year effects to account for multilateral resistance and country sizes in each period. I have several questions:
Will the predicted values obtained from running predict, xb be predicted trade in logs or in levels? From the lin-log specification of PPML generally I would expect them to be in levels.

The sums of my predicted results (both for logs and levels) overall, by year as well as by year and exporter are far from their corresponding sums in the observed data. That is although from eq(10) or eq(11) in Arvis and Shepherd (2013) (i.e. the FOC for the log-likelihood in PPML) it becomes clear that including year-importer fixed effects should render these sums equal for each exporter in each year. The total sum of all predicted to all observed trade should also be equal.

Following my estimation I run:

predict pr_grav, xb
egen check = total(pr_grav)
egen check2 = total(tradevalue)
gen check3 = check / check2
For the ratio of the sums Check2 I obtain a value of 0.039, where I would expect it to be precisely 1. That implies that overall trade is smaller than observed trade by a factor of 25.

Similarly, for the sum of importer-year trade I run
egen check_y1 = total(pr_grav), by(year importer)
egen check_y2 = total(tradevalue), by(year importer)
gen check_y3 = check_y1 / check_y2
Again, I obtain arbitrary values, most of which range from 0.02 to 56 (implying that the sum of predicted values are off by up to a factor 56).
My core question now is: Does absorbing the fixed effects in ppmlhdfe render the result by Arvis and Shepherd invalid in some way or am I making a grave mistake?

Thank you for your help

Dear Daniel,

I can clarify that ppmlhdfe is compatible with predict. If you use predict with the "mu" option you will get the expected trade flow value. Note you need to add a "d" in your options syntax when you estimate with ppmlhdfe to make this possible. ppmlhdfe will give you a reminder about this if you try to use predict without it.

Another thing that concerns me though is that you are using factor variables to create the fixed effects. If you want exporter-time and importer-time fixed effects you need only put "abs(importercode#year exportercode#year)" not "abs(i.importercode#i.year i.exportercode#i.year)". The latter may be much slower.

"xb" is what it sounds like: the b's are your estimated coefficients and the x's are your covariates. Hence, xb = x1 * b1 + x2 * b2 + (...) Note there is no such "adding up" property involving xb.

Yes you are correct that if you add the predicted xb and fixed effects values together and then take the exponent you should get the predicted trade value. But this is actually not necessary...

Regards,
Tom

Last edited by Tom Zylkin; 09 Jul 2020, 17:43.
2 likes
Comment
Daniel Prosi

Join Date: Dec 2019

Posts: 17
#6

10 Jul 2020, 07:23

Thank you Tom Zylkin . Again, a very helpful remark.
Comment
Farhad Russell

Join Date: Apr 2017

Posts: 6
#7

30 Jul 2020, 23:47

Hi Tom Zylkin Thanks for you help here, much appreciated. Following your suggestion I was trying to edit my code for predicting expected trade value. Am I writing correctly the command:
ppmlhdfe tradeflow_gdp lnpopr lnpopp border_lnpopr border_lnpopp , a(iso3r#year iso3p#year, save) standardize_data(0) d cluster(pair) nolog
predict fittedxtreg4, xb
predict stdpred_fitxtreg4, stdp
gen ptrade4=exp(fittedxtreg4)
Comment
Farhad Russell

Join Date: Apr 2017

Posts: 6
#8

30 Jul 2020, 23:49

Tom Zylkin Please disregard the above post
Comment
Farhad Russell

Join Date: Apr 2017

Posts: 6
#9

30 Jul 2020, 23:54

Hi Tom Zylkin Thanks for you help here, much appreciated. Following your suggestion, I was trying to edit my code for predicting expected trade value. Am I writing correctly the command:

ppmlhdfe tradeflow lnpopr lnpopp border_lnpopr border_lnpopp , a(iso3r#year iso3p#year, save) standardize_data(0) d cluster(pair) nolog

predict fitppmlhdfe, mu
predict stdpred_fitppmlhdfe, stdp
gen ptrade=exp(fitppmlhdfe)
Comment
Tom Zylkin

Join Date: Nov 2016

Posts: 188
#10

31 Jul 2020, 08:50

Originally posted by Farhad Russell View Post

Hi Tom Zylkin Thanks for you help here, much appreciated. Following your suggestion, I was trying to edit my code for predicting expected trade value. Am I writing correctly the command:

ppmlhdfe tradeflow lnpopr lnpopp border_lnpopr border_lnpopp , a(iso3r#year iso3p#year, save) standardize_data(0) d cluster(pair) nolog

predict fitppmlhdfe, mu
predict stdpred_fitppmlhdfe, stdp
gen ptrade=exp(fitppmlhdfe)

Dear Farhad,
I think "predict fitppmlhdfe,mu" should give you predicted trade here. Is ptrade intended to give you the standard error of the prediction here? I don't think that part is right.
Regards,
Tom
Comment
Farhad Russell

Join Date: Apr 2017

Posts: 6
#11

01 Aug 2020, 20:47

Hi Tom, Many thanks for your reply. In this regression I am trying to predict the trade share in GDP with 'ptrade`. What I understood that I only need to use "predict fitppmlhdfe,mu" to find the predicted trade share, and disregard other. I hope you find the options I use after the regression command are right.
Thanks again and best regards,
Farhad.
Comment
Tom Zylkin

Join Date: Nov 2016

Posts: 188
#12

02 Aug 2020, 07:33

Originally posted by Farhad Russell View Post

Hi Tom, Many thanks for your reply. In this regression I am trying to predict the trade share in GDP with 'ptrade`. What I understood that I only need to use "predict fitppmlhdfe,mu" to find the predicted trade share, and disregard other. I hope you find the options I use after the regression command are right.
Thanks again and best regards,
Farhad.

Hi Farhad,
If ptrade is supposed to be the predicted trade share in GDP, then just take predicted trade ("fitppmlhdfe") and divide by GDP. Predict, stdp is usually for obtaining the standard error of the prediction. Though for ppmlhdfe I believe it instead gives you the standard error of the linear predictor (i.e., xb) rather than of the predicted mean (which would be e^(xb+fes)).
Regards,
Tom
Comment
Hussain Sulaimani

Join Date: Apr 2021

Posts: 14
#13

16 Apr 2021, 23:18

Good day all,
Thank you for this thread that helps me in predicting. However, I have some inquiries, please.

Following the thread and advices that Joao Santos Silva and Tom Zylkin have provided in #3 and #4 by including d in option syntax on ppmlhdfe codeto include the fixed effect in predictions.
I run ppmlhdfe to estimate the attraction constrained gravity model as follows:

PHP Code:

ppmlhdfe Flow_ij lnTEU_i lnDistance_ij d_Rail_ij d_Redsea_i lnGasolinePrice_t lnBunkerRate_i lnFreightRate_t i.Province_j, vce(cluster ID) d(newvar1)

Where:
Flow_ij is freight flow between port i and province j.
lnDistance_ij is the log of distance between port i and province j.
d_Rail_ij d_Redsea_i are two dummies for Rail availability and port location.
I also included the fixed effect of i.Province_j to include the unobserved effect of province j.

These are the model estimates I got:

PHP Code:

. ppmlhdfe Flow_ij lnTEU_i lnDistance_ij d_Rail_ij d_Redsea_i lnGasolinePrice_t lnBunkerRate_i lnFreightRate_t i.Province_j, vce(cluster ID) d(newvar > 1) Iteration 1: deviance = 1.8240e+07 eps = . iters = 1 tol = 1.0e-04 min(eta) = -3.62 P Iteration 2: deviance = 1.1142e+07 eps = 6.37e-01 iters = 1 tol = 1.0e-04 min(eta) = -5.39 Iteration 3: deviance = 1.0150e+07 eps = 9.78e-02 iters = 1 tol = 1.0e-04 min(eta) = -6.67 Iteration 4: deviance = 1.0099e+07 eps = 5.08e-03 iters = 1 tol = 1.0e-04 min(eta) = -7.12 Iteration 5: deviance = 1.0098e+07 eps = 3.11e-05 iters = 1 tol = 1.0e-04 min(eta) = -7.16 Iteration 6: deviance = 1.0098e+07 eps = 4.21e-09 iters = 1 tol = 1.0e-05 min(eta) = -7.16 S O ------------------------------------------------------------------------------------------------------------ (legend: p: exact partial-out s: exact solver h: step-halving o: epsilon below tolerance) Converged in 6 iterations and 6 HDFE sub-iterations (tol = 1.0e-08) PPML regression No. of obs = 507 Residual df = 38 Statistics robust to heteroskedasticity Wald chi2(19) = 267609.61 Deviance = 10098271.74 Prob > chi2 = 0.0000 Log pseudolikelihood = -5051290.332 Pseudo R2 = 0.9194 Number of clusters (ID) = 39 (Std. Err. adjusted for 39 clusters in ID) ----------------------------------------------------------------------------------- | Robust Flow_ij | Coef. Std. Err. z P>|z| [95% Conf. Interval] ------------------+---------------------------------------------------------------- lnTEU_i | .5091603 .2050472 2.48 0.013 .1072751 .9110455 lnDistance_ij | -.5966364 .0645599 -9.24 0.000 -.7231714 -.4701014 d_Rail_ij | 1.397236 .2703799 5.17 0.000 .8673009 1.927171 d_Redsea_i | .8889462 .4633548 1.92 0.055 -.0192124 1.797105 lnGasolinePrice_t | -.2135134 .0688799 -3.10 0.002 -.3485155 -.0785112 lnBunkerRate_i | .2990565 .1514294 1.97 0.048 .0022604 .5958527 lnFreightRate_t | -.5772155 .1585844 -3.64 0.000 -.8880352 -.2663958 | Province_j | bah | -2.607147 .2788131 -9.35 0.000 -3.153611 -2.060684 epr | .187406 .3250745 0.58 0.564 -.4497284 .8245404 jaz | -.711751 .3230449 -2.20 0.028 -1.344907 -.0785947 jof | -1.590804 .3332172 -4.77 0.000 -2.243898 -.9377107 mad | .0443517 .2926633 0.15 0.880 -.5292579 .6179613 mkk | -.84163 .3586469 -2.35 0.019 -1.544565 -.1386951 nai | -.9154942 .3373447 -2.71 0.007 -1.576678 -.2543107 naj | -1.791735 .3549225 -5.05 0.000 -2.48737 -1.0961 nbr | -1.853444 .797542 -2.32 0.020 -3.416597 -.2902903 qas | .2175232 .5026922 0.43 0.665 -.7677355 1.202782 riy | 1.145969 .2279948 5.03 0.000 .6991075 1.59283 tab | -.910621 .3442817 -2.64 0.008 -1.585401 -.2358412 | _cons | 12.32115 1.553988 7.93 0.000 9.27539 15.36691 -----------------------------------------------------------------------------------

Thereafter, I predicted Flow_ij using predict code as follows:

PHP Code:

predict pr_Flow_ij, mu

However,
1. the results I got in ( pr_Flow_i ) are very different from the actual ones I have in ( Flow_ij )
2. The deviance in the estimated model is very high (as can be seen below in the outcomes of ppmlhdfe), is this the reason for the big difference?
3. What I am doing wrong that yielded this large gap?
4. Please let me know if more info is needed to clarify the issue.

Thank you,
Hussain
Comment
Tom Zylkin

Join Date: Nov 2016

Posts: 188
#14

17 Apr 2021, 08:48

Hi Hussain,
The syntax you are using for predict is correct. If you want to do a quick check, input

HTML Code:

sum Flow_ij pr_Flow_ij

Both variables should have the same mean value.

Regarding your other questions, it's not clear that you've done something "wrong" per se. In general, any model is estimated with some error, and the error is generally going to be large for at least a few observations. The only way you are going to reduce the amount of error is to improve the model fit. But you may not necessarily want to do that just for the sake of doing it. For example, you could include port (i) and time (t) fixed effects in addition to province fixed effects, which will absorb all i- and t-specific variation, but this would mean that you will not be able to identify the effects of gasoline price, freight rates, lnTEU, or bunker fit. If these estimates are important to your objective, this is not the direction you want to go in.

Regarding the deviance, there's not much to say here because typically we need a baseline for comparison. The deviance of a particular model in isolation is not that interesting to focus on. Another thing to keep aware of is that the deviance is not invariant to the scale of the dependent variable. If you divide all your flow variables by 1000, you will get a different deviance.

Finally, there's another way to include the fixed effects in ppmlhdfe. This will be faster to estimate, especially when you have a lot of fixed effects:

HTML Code:

ppmlhdfe Flow_ij lnTEU_i lnDistance_ij d_Rail_ij d_Redsea_i lnGasolinePrice_t lnBunkerRate_i lnFreightRate_t, a(Province_j) vce(cluster ID) d(newvar1)

Hope this is helpful!

Regards,
Tom
Comment
Hussain Sulaimani

Join Date: Apr 2021

Posts: 14
#15

18 Apr 2021, 23:17

Thank you Tom Zylkin for your reply. This is valuable advice. Since these variables are of importance, I will not you any additional FE. Thus, I will just stick to the FE of the province (i.province_j).

By comparing the mean, I found that predicted and actual dependent variables have the same mean.

I will keep going on the analysis and let you know if I encounter any issues.

Best regards,
Hussain
Comment

Announcement

ppmlhdfe in panels: predicted values and the adding up problem

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment