How do i browse or drop singleton observations from a fixed effects model?

Andreas Johannes

Join Date: Apr 2021

Posts: 3
#1

How do i browse or drop singleton observations from a fixed effects model?

27 Apr 2021, 13:02

Dear all,

I am running an augmented version of the gravity equation using a fixed effects model. The model was run using the reghdfe command available from the SSC package, and the estimation equation was clustered by country-pairs.

My estimation equation is written below:

i = exporter, j = importer, t = time
ln X denotes lnexports from exporter i to importer j in time t

Note that if AANZFTA = 1, then RTA = 0

The fixed effects terms have been generated the following way:

My dataset includes:
178 unique exporters and importers exist in the dataset for all years. Trade between the same country i.e., AUS to AUS does not exist. Therefore, I have178 exporters *177 importers *1 year = 31,506 observations per year

14 periods/years exists, yielding 441,084 observations in total

Question:
My estimation model generates 2,366 singleton observations, but I don't understand why or how these are generated. According to the article linked by Stata after running the regression (Correia, 2015) Singleton groups are "groups with only one observation", however I still can't figure out what this actually means.

Any who knows what these singleton observations are, and if I can browse through them using the browse command or drop them? I am aware that they are dropped automatically and don't affect my coefficients

Thank you in advance
Tags: fixed effects, gravity equation
Andreas Johannes

Join Date: Apr 2021

Posts: 3
#2

29 Apr 2021, 12:26

I may have done some progress myself on this matter. Maybe someone can assist and tell me if I am right or wrong

In stata I can write the following code to single out singleton variables and the drop them (if I want to). This seems to enable me to find the observations that are singleton

Code:

eret li gen byte used=e(sample) drop if used==0

If singleton groups are "groups with only one observation", then I suppose in my context that if a country-pair (one exporter and one importer), which consists of 14 cross-sections, only has data for 1 cross-section. then this observation would be dropped together with the 13 other observations in the same country-pair that had missing data. However, why must these be dropped? Is it because - in my situation with fixed effects being "nested within clusters" as explained by (Correia, 2015) - that including them would overestimate the significance of regression coefficients, and simultaneously because singleton observations have zero within-group information? I do find clustering, fixed effects and singletons quite interconnected in this context, which causes confusion...

Thank you in advance
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4462
#3

29 Apr 2021, 13:14

this will drop any observation not included in the estimation regardless of reason; since we know virtually nothing about your data (you have posted in #1 in such a way that much of what you posted is not legible by my - please read the FAQ), I have no idea if this will drop only what you want or more than you want; please provide a data sample using -dataex- and your exact command to Stata and exactly what Stata gave you back -- all using CODE blocks (again, read the FAQ)
Comment

Andreas Johannes

Join Date: Apr 2021
Posts: 3

30 Apr 2021, 03:13

Sorry, I should have posed the data earlier. To keep it simple and because it has no effect on my question, the AANZFTA variable is excluded.

Here i have a sample of my data.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input int year str3(iso3_o iso3_d) byte rta2 int(iso3_o_year iso3_d_year pair) double lnexports_imfdots_fob
1980 "ARM" "CRI" 0  85 533 1100        .
1983 "ARM" "CRI" 0  86 534 1100        .
1986 "ARM" "CRI" 0  87 535 1100        .
1989 "ARM" "CRI" 0  88 536 1100        .
1992 "ARM" "CRI" 0  89 537 1100        .
1995 "ARM" "CRI" 0  90 538 1100        .
1998 "ARM" "CRI" 0  91 539 1100        .
2001 "ARM" "CRI" 0  92 540 1100        .
2004 "ARM" "CRI" 0  93 541 1100 5.257495
2007 "ARM" "CRI" 0  94 542 1100        .
2010 "ARM" "CRI" 0  95 543 1100        .
2013 "ARM" "CRI" 0  96 544 1100        .
2016 "ARM" "CRI" 0  97 545 1100        .
2019 "ARM" "CRI" 0  98 546 1100        .
1980 "AUS" "ARM" 0  99  85 1246        .
1983 "AUS" "ARM" 0 100  86 1246        .
1986 "AUS" "ARM" 0 101  87 1246        .
1989 "AUS" "ARM" 0 102  88 1246        .
1992 "AUS" "ARM" 0 103  89 1246        .
1995 "AUS" "ARM" 0 104  90 1246  10.1603
1998 "AUS" "ARM" 0 105  91 1246 12.57325
2001 "AUS" "ARM" 0 106  92 1246 13.42377
2004 "AUS" "ARM" 0 107  93 1246 14.09844
2007 "AUS" "ARM" 0 108  94 1246 15.56213
2010 "AUS" "ARM" 0 109  95 1246 13.55597
2013 "AUS" "ARM" 0 110  96 1246  15.4948
2016 "AUS" "ARM" 0 111  97 1246 13.19877
2019 "AUS" "ARM" 0 112  98 1246  13.5163
1980 "AUS" "AUT" 0  99 113 1247 15.29712
1983 "AUS" "AUT" 0 100 114 1247 14.67417
1986 "AUS" "AUT" 0 101 115 1247 15.63683
1989 "AUS" "AUT" 0 102 116 1247 16.30337
1992 "AUS" "AUT" 0 103 117 1247 16.81265
1995 "AUS" "AUT" 0 104 118 1247 17.10787
1998 "AUS" "AUT" 0 105 119 1247 17.09933
2001 "AUS" "AUT" 0 106 120 1247 17.34253
2004 "AUS" "AUT" 0 107 121 1247 17.71021
2007 "AUS" "AUT" 0 108 122 1247 17.79293
2010 "AUS" "AUT" 0 109 123 1247 18.39671
2013 "AUS" "AUT" 0 110 124 1247 18.27011
2016 "AUS" "AUT" 0 111 125 1247 19.25813
2019 "AUS" "AUT" 0 112 126 1247 16.99567
1980 "AUS" "BLZ" 0  99 281 1259 12.25486
1983 "AUS" "BLZ" 0 100 282 1259        .
1986 "AUS" "BLZ" 0 101 283 1259  9.21034
1989 "AUS" "BLZ" 0 102 284 1259 11.28978
1992 "AUS" "BLZ" 0 103 285 1259 10.58597
1995 "AUS" "BLZ" 0 104 286 1259 11.19799
1998 "AUS" "BLZ" 0 105 287 1259 11.10629
2001 "AUS" "BLZ" 0 106 288 1259 12.59133
2004 "AUS" "BLZ" 0 107 289 1259 10.73748
2007 "AUS" "BLZ" 0 108 290 1259 13.59473
2010 "AUS" "BLZ" 0 109 291 1259  12.9487
2013 "AUS" "BLZ" 0 110 292 1259 12.21608
2016 "AUS" "BLZ" 0 111 293 1259  13.4347
2019 "AUS" "BLZ" 0 112 294 1259 12.40789
end

Based on this example, there really is only 1 singleton observation that is dropped from the regression, namely the tradeflow from ARM -> CRI in 2004 as all other trade flows between that pair in all other periods are missing. The remaining non-missing trade flows are included in the regression

I ran the following regression in stata

Code:

xtset pair year
reghdfe lnexports_imfdots_fob rta2, absorb(i.iso3_o_year i.iso3_d_year i.pair) vce(cluster i.pair)

The only way I found possible to single out the singleton observations was by dropping missing trade flow observations and then browse/drop observations not included in the regression:

Code:

drop if lnexports_imfdots_fob==.
eret li
gen byte used=e(sample)
drop if used==0

While (Correria, 2015) did illustrate an example using stata's sample dataset, auto.dta, hard-coded data was included in that example, and I was unable to understand it..

The fixed effects are generated the following way:

Code:

egen iso3_o_year=group(iso3_o year)
egen iso3_d_year=group(iso3_d year)
egen pair=group(iso3_o iso3_d)

The number of observations above in the sample is too few to run the regression. If you want to see the singleton observation, simply run the regression without the exporter-time and importer-time fixed effects. While the output is nonsense, it's simply to illustrate the singleton:

Code:

xtset pair year
reghdfe lnexports_imfdots_fob rta2, absorb(i.pair) vce(cluster i.pair)
drop if lnexports_imfdots_fob==.
eret li
gen byte used=e(sample)
browse

Returning back to my question, i'm still quite confused about why the singleton observation is dropped. Despite it only being a few observations from a very large dataset, i'm seeking an explanation for why this happens as it currently seems as a "black box" to me.

Best
Andreas

Announcement

How do i browse or drop singleton observations from a fixed effects model?

Comment

Comment

Comment