Predict-command after ppmlhdfe

Kathrin Me

Join Date: Sep 2021

Posts: 54
#1

Predict-command after ppmlhdfe

05 Aug 2024, 07:53

Hi all,

In a dataset with one observation per firm, year and destination, I try two different ways of predicting trade flows.

The data set has a lot of zeros. Therefore, I started by estimating a ppml:

Approach 1: PPML

Code:

ppmlhdfe exports log(distance) ... , vce(r) d absorb(industry firm_id) predict pred_exps1 if year == 2005, mu

An alternative approach is to only take strictly positive trade flows into account and estimate the model in logs. After this, I predict trade flows in levels:

Approach 2: Log-model

Code:

reghdfe log_exports log(distance) ... , vce(r) absorb(industry firm_id) resid predict pred_exps2_biased if year == 2005, xbd gen pred_exps2 = exp(pred_exps2_biased) * exp(0.5 * e(rmse)^2) if year == 2005

(The data set runs from 2005 to 2019.)

How do I get the right standard errors in the ppml-case, so that the two predictions become comparable? I would expect the same amount of predicted observations and a correlation between pred_exps1 and pred_exps2 of 1?

Best.
Kathrin
Tags: postestimation, ppmlhdfe, predict, stderrors
Maxence Morlet

Join Date: Mar 2021

Posts: 653
#2

06 Aug 2024, 05:56

You may want to cluster standard errors by firm to allow for within firm-dependence. In approach 2, all observations with 0 exports are dropped. Approach 1 is better for reasons detailed by Santos Silva and Tenreyro (2006, 2011, 2022).

One of these reasons is Jensen's inequality. The exponent of the expectation is not equal to the expectation of the exponent. This explains a divergence in coefficient estimation, and therefore fitted values. I would focus on approach 1 if I were you.
1 like
Comment
Kathrin Me

Join Date: Sep 2021

Posts: 54
#3

06 Aug 2024, 09:12

Thanks for your input Maxence Morlet. I'll cluster the std errors.

Regarding the two approaches: Yes, I am dropping all observations with 0 exports. However, the predictions will be for all observations (also for the ones that had 0 exports).
I am currently just trying to understand why the two approaches do not give me the same results. You are right about the exponent expectation relation, but I would nevertheless expect a correlation close to 1 (if not 1). But I only get a correlation of .4 at the moment.

Is the mu-option in the ppml-case considering std. errors at all? How can I take care of the std. errors in the ppml-prediction-case?

Best,
Kathrin
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10180
#4

07 Aug 2024, 02:52

You should never log a variable with zero values.

I am currently just trying to understand why the two approaches do not give me the same results.

Regarding why the predictions are not correlated, you are underestimating the presence of zeros in the outcome. As Maxence states, logging zero values turns them into missing values, so the estimates are based on different samples, considering the data set has a lot of zeros. If you want to compare like for like, use the same estimation sample.

Code:

ppmlhdfe exports log(distance) ... if exports>0 , cluster(firm_id) d absorb(industry firm_id)
1 like
Comment

Announcement

Predict-command after ppmlhdfe

Comment

Comment

Comment