Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How do i browse or drop singleton observations from a fixed effects model?

    Dear all,

    I am running an augmented version of the gravity equation using a fixed effects model. The model was run using the reghdfe command available from the SSC package, and the estimation equation was clustered by country-pairs.

    My estimation equation is written below:

    i = exporter, j = importer, t = time
    ln X denotes lnexports from exporter i to importer j in time t



    Note that if AANZFTA = 1, then RTA = 0


    The fixed effects terms have been generated the following way:



    My dataset includes:
    • 178 unique exporters and importers exist in the dataset for all years. Trade between the same country i.e., AUS to AUS does not exist. Therefore, I have178 exporters *177 importers *1 year = 31,506 observations per year
    • 14 periods/years exists, yielding 441,084 observations in total
    Question:
    My estimation model generates 2,366 singleton observations, but I don't understand why or how these are generated. According to the article linked by Stata after running the regression (Correia, 2015) Singleton groups are "groups with only one observation", however I still can't figure out what this actually means.

    Any who knows what these singleton observations are, and if I can browse through them using the browse command or drop them? I am aware that they are dropped automatically and don't affect my coefficients

    Thank you in advance​​​​​​

  • #2
    I may have done some progress myself on this matter. Maybe someone can assist and tell me if I am right or wrong

    In stata I can write the following code to single out singleton variables and the drop them (if I want to). This seems to enable me to find the observations that are singleton
    Code:
    eret li
    gen byte used=e(sample)
    drop if used==0
    If singleton groups are "groups with only one observation", then I suppose in my context that if a country-pair (one exporter and one importer), which consists of 14 cross-sections, only has data for 1 cross-section. then this observation would be dropped together with the 13 other observations in the same country-pair that had missing data. However, why must these be dropped? Is it because - in my situation with fixed effects being "nested within clusters" as explained by (Correia, 2015) - that including them would overestimate the significance of regression coefficients, and simultaneously because singleton observations have zero within-group information? I do find clustering, fixed effects and singletons quite interconnected in this context, which causes confusion...

    Thank you in advance

    Comment


    • #3
      this will drop any observation not included in the estimation regardless of reason; since we know virtually nothing about your data (you have posted in #1 in such a way that much of what you posted is not legible by my - please read the FAQ), I have no idea if this will drop only what you want or more than you want; please provide a data sample using -dataex- and your exact command to Stata and exactly what Stata gave you back -- all using CODE blocks (again, read the FAQ)

      Comment


      • #4
        Sorry, I should have posed the data earlier. To keep it simple and because it has no effect on my question, the AANZFTA variable is excluded.

        Here i have a sample of my data.
        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input int year str3(iso3_o iso3_d) byte rta2 int(iso3_o_year iso3_d_year pair) double lnexports_imfdots_fob
        1980 "ARM" "CRI" 0  85 533 1100        .
        1983 "ARM" "CRI" 0  86 534 1100        .
        1986 "ARM" "CRI" 0  87 535 1100        .
        1989 "ARM" "CRI" 0  88 536 1100        .
        1992 "ARM" "CRI" 0  89 537 1100        .
        1995 "ARM" "CRI" 0  90 538 1100        .
        1998 "ARM" "CRI" 0  91 539 1100        .
        2001 "ARM" "CRI" 0  92 540 1100        .
        2004 "ARM" "CRI" 0  93 541 1100 5.257495
        2007 "ARM" "CRI" 0  94 542 1100        .
        2010 "ARM" "CRI" 0  95 543 1100        .
        2013 "ARM" "CRI" 0  96 544 1100        .
        2016 "ARM" "CRI" 0  97 545 1100        .
        2019 "ARM" "CRI" 0  98 546 1100        .
        1980 "AUS" "ARM" 0  99  85 1246        .
        1983 "AUS" "ARM" 0 100  86 1246        .
        1986 "AUS" "ARM" 0 101  87 1246        .
        1989 "AUS" "ARM" 0 102  88 1246        .
        1992 "AUS" "ARM" 0 103  89 1246        .
        1995 "AUS" "ARM" 0 104  90 1246  10.1603
        1998 "AUS" "ARM" 0 105  91 1246 12.57325
        2001 "AUS" "ARM" 0 106  92 1246 13.42377
        2004 "AUS" "ARM" 0 107  93 1246 14.09844
        2007 "AUS" "ARM" 0 108  94 1246 15.56213
        2010 "AUS" "ARM" 0 109  95 1246 13.55597
        2013 "AUS" "ARM" 0 110  96 1246  15.4948
        2016 "AUS" "ARM" 0 111  97 1246 13.19877
        2019 "AUS" "ARM" 0 112  98 1246  13.5163
        1980 "AUS" "AUT" 0  99 113 1247 15.29712
        1983 "AUS" "AUT" 0 100 114 1247 14.67417
        1986 "AUS" "AUT" 0 101 115 1247 15.63683
        1989 "AUS" "AUT" 0 102 116 1247 16.30337
        1992 "AUS" "AUT" 0 103 117 1247 16.81265
        1995 "AUS" "AUT" 0 104 118 1247 17.10787
        1998 "AUS" "AUT" 0 105 119 1247 17.09933
        2001 "AUS" "AUT" 0 106 120 1247 17.34253
        2004 "AUS" "AUT" 0 107 121 1247 17.71021
        2007 "AUS" "AUT" 0 108 122 1247 17.79293
        2010 "AUS" "AUT" 0 109 123 1247 18.39671
        2013 "AUS" "AUT" 0 110 124 1247 18.27011
        2016 "AUS" "AUT" 0 111 125 1247 19.25813
        2019 "AUS" "AUT" 0 112 126 1247 16.99567
        1980 "AUS" "BLZ" 0  99 281 1259 12.25486
        1983 "AUS" "BLZ" 0 100 282 1259        .
        1986 "AUS" "BLZ" 0 101 283 1259  9.21034
        1989 "AUS" "BLZ" 0 102 284 1259 11.28978
        1992 "AUS" "BLZ" 0 103 285 1259 10.58597
        1995 "AUS" "BLZ" 0 104 286 1259 11.19799
        1998 "AUS" "BLZ" 0 105 287 1259 11.10629
        2001 "AUS" "BLZ" 0 106 288 1259 12.59133
        2004 "AUS" "BLZ" 0 107 289 1259 10.73748
        2007 "AUS" "BLZ" 0 108 290 1259 13.59473
        2010 "AUS" "BLZ" 0 109 291 1259  12.9487
        2013 "AUS" "BLZ" 0 110 292 1259 12.21608
        2016 "AUS" "BLZ" 0 111 293 1259  13.4347
        2019 "AUS" "BLZ" 0 112 294 1259 12.40789
        end
        Based on this example, there really is only 1 singleton observation that is dropped from the regression, namely the tradeflow from ARM -> CRI in 2004 as all other trade flows between that pair in all other periods are missing. The remaining non-missing trade flows are included in the regression

        I ran the following regression in stata

        Code:
        xtset pair year
        reghdfe lnexports_imfdots_fob rta2, absorb(i.iso3_o_year i.iso3_d_year i.pair) vce(cluster i.pair)
        The only way I found possible to single out the singleton observations was by dropping missing trade flow observations and then browse/drop observations not included in the regression:
        Code:
        drop if lnexports_imfdots_fob==.
        eret li
        gen byte used=e(sample)
        drop if used==0
        While (Correria, 2015) did illustrate an example using stata's sample dataset, auto.dta, hard-coded data was included in that example, and I was unable to understand it..

        The fixed effects are generated the following way:
        Code:
        egen iso3_o_year=group(iso3_o year)
        egen iso3_d_year=group(iso3_d year)
        egen pair=group(iso3_o iso3_d)
        The number of observations above in the sample is too few to run the regression. If you want to see the singleton observation, simply run the regression without the exporter-time and importer-time fixed effects. While the output is nonsense, it's simply to illustrate the singleton:
        Code:
        xtset pair year
        reghdfe lnexports_imfdots_fob rta2, absorb(i.pair) vce(cluster i.pair)
        drop if lnexports_imfdots_fob==.
        eret li
        gen byte used=e(sample)
        browse
        Returning back to my question, i'm still quite confused about why the singleton observation is dropped. Despite it only being a few observations from a very large dataset, i'm seeking an explanation for why this happens as it currently seems as a "black box" to me.

        Best
        Andreas

        Comment

        Working...
        X