ppmlhdfe, keepsingleton

Sei Jeong

Join Date: Jun 2020

Posts: 8
#1

ppmlhdfe, keepsingleton

03 Nov 2021, 21:00

Dear statalist members,

I am using ppmlhdfe to investigate the bilateral trade flow and have a question about handling singleton issue.

Here is my code:
ppmlhdfe import lndistance lntariff contiguity common_language agree_cu, absorb(imptime exptime, savefe) cluster(impexp) nolog d

- imptime: importer-year pair fixed effect
- exptime: exporter-year pair fixed effect
- impexp: importer-exporter country pair

Specifically, my data has 189 importer countries and 211 exporter countries from 1989 to 2019, yielding 905,138 of total observation number.
However, after the estimation, only 95,827 observations remained.
The stata says that it dropped 809,303 observations \that are either singletons or separated by a fixed effect.
My data set has so many zero values for dependent variable, so I am guessing that it caused so many singletons.

With an option, "keepsingleton", I could keep almost all the observation numbers with no changes in the estimated coefficients.
Stata says that "ReLU method dropped 8 separated observations in 1 iterations
Converged in 17 iterations and 246 HDFE sub-iterations (tol = 1.0e-08)"

Here are my specific questions:
1. Is there anything wrong in my code?
2. Is my ppml estimation with so many singleton meaningful?
3. Can I keep use the "keepsingleton" option for my estimation?
Tags: None
Joao Santos Silva

Join Date: Apr 2014

Posts: 3015
#2

04 Nov 2021, 02:21

Dear @Sei Jeong,

Tom Zylkin may want to add to this, but here is my reply.

1. Looks fine to me
2. Yes
3. Do not use that option. The observations that you are keeping do not contribute to the estimation of the parameters of interest but will affect the estimated standard errors. More generally, the observations that are dropped do not contribute to the estimation and therefore are dropped to speed up computation; there are no sample-selection issues by dropping them.

Best wishes,

Joao
Comment
Sei Jeong

Join Date: Jun 2020

Posts: 8
#3

04 Nov 2021, 08:18

Dear Joao Santos Silva ,

I really appreciate it.
If you do not mind, can I ask one more question?
I've included ln(sum of exporter's and importer's GDPs) as an additional explanatory variable, but the results kept giving me a negative estimate of the coefficient, implying GDP has negative relationship with the import flows.
The GDP should positively affect the import, could you guess why I got this result?
(Including exporter's gdp and importer's gdp caused ommitted explanatory variables, so i made a summation of them)
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3015
#4

04 Nov 2021, 11:06

What fixed effects are you including?
Comment
Sei Jeong

Join Date: Jun 2020

Posts: 8
#5

04 Nov 2021, 13:52

Dear Joao Santos Silva ,

I used the same model and the same fixed effects.

ppmlhdfe import lndistance lntariff contiguity common_language agree_cu ln(sum of exporter's and importer's GDPs), absorb(imptime exptime, savefe) cluster(impexp) nolog d

Since I have several time invariant independent variables, I only included the expoter-year and importer-year pair fixed effects.
is there anything wrong?
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3015
#6

04 Nov 2021, 14:35

With those fixed effects, the effect of the GDP for each partner cannot be identified. You are estimating the effect of an interaction, but you need to take into account the latent effect of each GDP. Your result probably means that each GDP has a positive effect but the interaction is negative.

Best wishes,

Joao
Comment
Sei Jeong

Join Date: Jun 2020

Posts: 8
#7

04 Nov 2021, 16:59

Yes, I estimated the effect of an interaction because of the omission problem when including each GDP as I mentioned above.

I really appreciate all of your comments and help.

Best,
Sei
Comment
Sei Jeong

Join Date: Jun 2020

Posts: 8
#8

04 Nov 2021, 18:31

Dear Joao Santos Silva,

I am sorry to keep bothering you.

Including the GDP of each country, if I include any independent variable of each country such as each of population and each of hostility level, then several explanatory variables are omitted because of collinearity.
I've tried different sets of fixed effects as well as different sets of independent variables, but could not figure how to solve this collinearity and omission problem.
That's why I used the interaction of GDP rather than each of GDP, and later I excluded the GDP from the model.

Do you have any idea to fix the collinearity problem to include each of GDP into the estimation model?
Or would it be better to use a simple model without considering GDP?
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3015
#9

05 Nov 2021, 03:30

You cannot include any country characteristics because they are all collinear with the fixed effects. This is not a problem, it just means that the fixed effects take care of the country characteristics (that is why you include them!) and therefore any country-specific variables are redundant.

Best wishes,

Joao
Comment
Sei Jeong

Join Date: Jun 2020

Posts: 8
#10

05 Nov 2021, 09:14

Thanks for the explanation. Now I can understand better,
Comment

Announcement

ppmlhdfe, keepsingleton

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment