In my project, I use a recently developed estimator ("Blow-Up and Cluster") which allows to control for fixed effects when the response is an ordinal variable (life satisfaction in my case).
My problem is that I have two alternative codings of it and they yield (sometimes very) different results and drop different number of observations. Moreover, I cannot trace why so much more observations are dropped as compared to linear fixed effects transformation.
The idea of the estimator is to copy each observation K-1 times (K is the number of categories in the ordinal variable), use clogit at all possible dichotomizations on all of the sample and then cluster the errors at an individual level to correct for the multiple entries in the likelihood function. Full description (page 11)
The authors provide a code for the estimator which is the following:
However, Dickerson et al.notes this code drops observations in the British Household Panel Survey (which I am using) because the new id variable is stored as a long, which has a maximum of 2,147,483,620 and the id's of the BHPS are larger than this. They provide an alternative coding:
I recoded the identifier variable in BHPS, so that it doesn't exceed the maximum and tried feologic_buc and bucologit on the same specifications and I get the following results for an example specification:
Could anyone spot what exactly is causing the differences?
As of my second problem, inconsistency with linear fixed effects, I do not understand why xtreg, fe has so much more observations. Results below:
I know that clogit will drop all observations that do not vary in life satisfcation, whereas xtreg, fe does not. However, I find that in this sample only 20 observations fall into this criteria (I didn't find this explicitly but included lfsato as a regressor in a separate xtreg, fe and the sample got reduced by 20).
Can anybody suggest why the number of observations is so different from the above results?
My problem is that I have two alternative codings of it and they yield (sometimes very) different results and drop different number of observations. Moreover, I cannot trace why so much more observations are dropped as compared to linear fixed effects transformation.
The idea of the estimator is to copy each observation K-1 times (K is the number of categories in the ordinal variable), use clogit at all possible dichotomizations on all of the sample and then cluster the errors at an individual level to correct for the multiple entries in the likelihood function. Full description (page 11)
The authors provide a code for the estimator which is the following:
Code:
ivar is the individual identifier, yvar is the ordered dependent variable, and xvars is the list of explanatory variables. capture program drop feologit_buc program feologit_buc, eclass version 10 gettoken gid 0: 0 gettoken y x: 0 tempvar iid id cid gidcid dk qui sum ‘y’ local lk= r(min) local hk= r(max) bys ‘gid’: gen ‘iid’=_n gen long ‘id’=‘gid’*100+‘iid’ expand ‘=‘hk’-‘lk’’ bys ‘id’: gen ‘cid’=_n qui gen long ‘gidcid’= ‘gid’*100+‘cid’ qui gen ‘dk’= ‘y’>=‘cid’+1 clogit ‘dk’ ‘x’, group(‘gidcid’) cluster(‘gid’) end feologit_buc ivar yvar xvars
Code:
capture program drop bucologit program bucologit version 11.2 syntax varlist [if] [in], Id(varname) preserve marksample touse markout ‘touse’ ‘id’ gettoken yraw x : varlist tempvar y qui egen int ‘y’ = group(‘yraw’) qui keep ‘y’ ‘x’ ‘id’ ‘touse’ qui keep if ‘touse’ qui sum ‘y’ local ymax = r(max) forvalues i = 2(1)‘ymax’ { qui gen byte ‘yraw’‘i’ = ‘y’ >= ‘i’ } drop ‘y’ tempvar n cut newid qui gen long ‘n’ = _n qui reshape long ‘yraw’, i(‘n’) j(‘cut’) qui egen long ‘newid’ = group(‘id’ ‘cut’) sort ‘newid’ clogit ‘yraw’ ‘x’, group(‘newid’) cluster(‘id’) restore end
Could anyone spot what exactly is causing the differences?
As of my second problem, inconsistency with linear fixed effects, I do not understand why xtreg, fe has so much more observations. Results below:
I know that clogit will drop all observations that do not vary in life satisfcation, whereas xtreg, fe does not. However, I find that in this sample only 20 observations fall into this criteria (I didn't find this explicitly but included lfsato as a regressor in a separate xtreg, fe and the sample got reduced by 20).
Can anybody suggest why the number of observations is so different from the above results?
Comment