Problem with fixed effects (time invariant dummies not dropping out)

Philip Gigliotti

Join Date: Nov 2016

Posts: 118
#1

Problem with fixed effects (time invariant dummies not dropping out)

17 Dec 2016, 20:39

I'm running a fixed effects model and for some reason my time invariant variables aren't dropping out. These variables are coded 0 or 1. I only have about 1000 observations so I've been sorting them by unit and the dummy and looking for miscoded variables but I can't find any.

The problem might be that I am using two kinds of fixed effects. One for university and one for college president. More than one of the college president dummies are dropping out for multicollinearity. Could this be the problem? And why is this happening?
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#2

18 Dec 2016, 03:49

Philip:
it's difficult to guess what's going on without seeing an example/excerpt of your dataset, that you can easily post via -search dataex-. Thanks.

Kind regards,
Carlo
(Stata 19.0)
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2594
#3

18 Dec 2016, 08:14

It would be easier to tell if we could see the command line as you have typed it in Stata. That said, it seems that you have used the regress command instead of xtreg. Some variables are dropped due to collinearity and Stata does not really care which variables to drop. Your model is not identified and the estimates of your time-invariant variables are confounded with the effects of the omitted dummies. You will probably observe that your dummies are no longer omitted if you manually drop the other time-invariant variables.

https://www.kripfganz.de/stata/
Comment
Philip Gigliotti

Join Date: Nov 2016

Posts: 118
#4

18 Dec 2016, 08:40

Ok, here is a data sample.

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input double unitid str80 pres_name long numpres_name byte(sex clergy) 149781 "A. Duane Litfin" 1 0 1 149781 "A. Duane Litfin" 1 0 1 149781 "A. Duane Litfin" 1 0 1 149781 "A. Duane Litfin" 1 0 1 149781 "A. Duane Litfin" 1 0 1 149781 "A. Duane Litfin" 1 0 1 149781 "A. Duane Litfin" 1 0 1 149781 "A. Duane Litfin" 1 0 1 182670 "Adam M. Keller" 3 . . 190150 "Alan Brinkley" 4 0 0 110662 "Albert Carnesale" 5 0 0 110662 "Albert Carnesale" 5 0 0 216287 "Alfred H. Bloom" 6 0 0 216287 "Alfred H. Bloom" 6 0 0 216287 "Alfred H. Bloom" 6 0 0 216287 "Alfred H. Bloom" 6 0 0 216287 "Alfred H. Bloom" 6 0 0 216287 "Alfred H. Bloom" 6 0 0 216287 "Alfred H. Bloom" 6 0 0 216287 "Alfred H. Bloom" 6 0 0 213543 "Alice P. Gast" 8 1 0 213543 "Alice P. Gast" 8 1 0 213543 "Alice P. Gast" 8 1 0 215062 "Amy Gutmann" 9 1 0 215062 "Amy Gutmann" 9 1 0 215062 "Amy Gutmann" 9 1 0 215062 "Amy Gutmann" 9 1 0 218663 "Andrew A. Sorensen" 10 0 0 218663 "Andrew A. Sorensen" 10 0 0 218663 "Andrew A. Sorensen" 10 0 0 218663 "Andrew A. Sorensen" 10 0 0 152673 "Andrew T. Ford" 12 0 0 152673 "Andrew T. Ford" 12 0 0 152673 "Andrew T. Ford" 12 0 0 152673 "Andrew T. Ford" 12 0 0 152673 "Andrew T. Ford" 12 0 0 166027 "Ann E. Berman" 14 . . 190044 "Anthony G. Collins" 15 0 0 190044 "Anthony G. Collins" 15 0 0 190044 "Anthony G. Collins" 15 0 0 190044 "Anthony G. Collins" 15 0 0 190044 "Anthony G. Collins" 15 0 0 190044 "Anthony G. Collins" 15 0 0 164465 "Anthony W. Marx" 16 0 0 164465 "Anthony W. Marx" 16 0 0 164465 "Anthony W. Marx" 16 0 0 164465 "Anthony W. Marx" 16 0 0 164465 "Anthony W. Marx" 16 0 0 164465 "Anthony W. Marx" 16 0 0 213385 "Arthur J. Rothkopf" 18 0 0 213385 "Arthur J. Rothkopf" 18 0 0 213385 "Arthur J. Rothkopf" 18 0 0 213385 "Arthur J. Rothkopf" 18 0 0 201645 "Barbara R. Snyder" 20 1 0 201645 "Barbara R. Snyder" 20 1 0 161004 "Barry Mills" 21 0 0 161004 "Barry Mills" 21 0 0 161004 "Barry Mills" 21 0 0 161004 "Barry Mills" 21 0 0 161004 "Barry Mills" 21 0 0 161004 "Barry Mills" 21 0 0 161004 "Barry Mills" 21 0 0 161004 "Barry Mills" 21 0 0 131159 "Benjamin Ladner" 22 0 0 131159 "Benjamin Ladner" 22 0 0 131159 "Benjamin Ladner" 22 0 0 131159 "Benjamin Ladner" 22 0 0 141060 "Beverly Daniel Tatum" 24 1 0 141060 "Beverly Daniel Tatum" 24 1 0 141060 "Beverly Daniel Tatum" 24 1 0 141060 "Beverly Daniel Tatum" 24 1 0 141060 "Beverly Daniel Tatum" 24 1 0 141060 "Beverly Daniel Tatum" 24 1 0 141060 "Beverly Daniel Tatum" 24 1 0 178396 "Brady J. Deaton" 25 0 0 178396 "Brady J. Deaton" 25 0 0 178396 "Brady J. Deaton" 25 0 0 178396 "Brady J. Deaton" 25 0 0 178396 "Brady J. Deaton" 25 0 0 216667 "Brian C. Mitchell" 26 0 0 216667 "Brian C. Mitchell" 26 0 0 216667 "Brian C. Mitchell" 26 0 0 211291 "Brian C. Mitchell" 26 0 0 211291 "Brian C. Mitchell" 26 0 0 211291 "Brian C. Mitchell" 26 0 0 211291 "Brian C. Mitchell" 26 0 0 211291 "Brian C. Mitchell" 26 0 0 173902 "Brian C. Rosenberg" 28 0 0 173902 "Brian C. Rosenberg" 28 0 0 173902 "Brian C. Rosenberg" 28 0 0 173902 "Brian C. Rosenberg" 28 0 0 173902 "Brian C. Rosenberg" 28 0 0 173902 "Brian C. Rosenberg" 28 0 0 163286 "C. Dan Mote Jr." 29 0 0 163286 "C. Dan Mote Jr." 29 0 0 163286 "C. Dan Mote Jr." 29 0 0 163286 "C. Dan Mote Jr." 29 0 0 167835 "Carol T. Christ" 31 1 0 167835 "Carol T. Christ" 31 1 0 167835 "Carol T. Christ" 31 1 0 end label values numpres_name numpres_name label def numpres_name 1 "A. Duane Litfin", modify label def numpres_name 3 "Adam M. Keller", modify label def numpres_name 4 "Alan Brinkley", modify label def numpres_name 5 "Albert Carnesale", modify label def numpres_name 6 "Alfred H. Bloom", modify label def numpres_name 8 "Alice P. Gast", modify label def numpres_name 9 "Amy Gutmann", modify label def numpres_name 10 "Andrew A. Sorensen", modify label def numpres_name 12 "Andrew T. Ford", modify label def numpres_name 14 "Ann E. Berman", modify label def numpres_name 15 "Anthony G. Collins", modify label def numpres_name 16 "Anthony W. Marx", modify label def numpres_name 18 "Arthur J. Rothkopf", modify label def numpres_name 20 "Barbara R. Snyder", modify label def numpres_name 21 "Barry Mills", modify label def numpres_name 22 "Benjamin Ladner", modify label def numpres_name 24 "Beverly Daniel Tatum", modify label def numpres_name 25 "Brady J. Deaton", modify label def numpres_name 26 "Brian C. Mitchell", modify label def numpres_name 28 "Brian C. Rosenberg", modify label def numpres_name 29 "C. Dan Mote Jr.", modify label def numpres_name 31 "Carol T. Christ", modify

here is the code for one version of my regression

xtset unitid year, yearly
sort unitid year
xtreg lnprescomp L.centered_rank L.cenpublicrank age sex termlength priorpres yearsprior clergy lnFTsal lnFTE lngifts lnendxstud satavg lnresearch resuni i.year i.numpres_name, fe robust cluster (unitid)

Fixed-effects (within) regression Number of obs = 863
Group variable: unitid Number of groups = 163

R-sq: Obs per group:
within = 0.6222 min = 2
between = 0.0201 avg = 5.3
overall = 0.0044 max = 7

F(17,162) = .
corr(u_i, Xb) = -0.9844 Prob > F = .

(Std. Err. adjusted for 163 clusters in unitid)

Robust
lnprescomp Coef. Std. Err. t P>t [95% Conf. Interval]

centered_rank
L1. .0027526 .0053348 0.52 0.607 -.0077821 .0132873

cenpublicrank
L1. -.0133977 .0064985 -2.06 0.041 -.0262304 -.0005649

age .1340173 .0773559 1.73 0.085 -.0187386 .2867732
sex -.7377239 .4265682 -1.73 0.086 -1.580075 .104627
termlength .2822993 .1350666 2.09 0.038 .0155812 .5490173
priorpres .0634937 .0350573 1.81 0.072 -.0057345 .132722
yearsprior -.1406934 .0775754 -1.81 0.072 -.2938829 .012496
clergy -2.576749 1.569818 -1.64 0.103 -5.676694 .5231947
lnFTsal .4982687 .2711326 1.84 0.068 -.0371411 1.033679
lnFTE -.7239067 .3974517 -1.82 0.070 -1.508761 .0609474
lngifts -.0251121 .0359195 -0.70 0.485 -.096043 .0458188
lnendxstud -.0760739 .0838705 -0.91 0.366 -.2416943 .0895466
satavg -.0001304 .0006751 -0.19 0.847 -.0014635 .0012027
lnresearch .0167538 .0243796 0.69 0.493 -.031389 .0648966
resuni 0 (omitted)

year
2004 -.3278364 .2162441 -1.52 0.131 -.7548571 .0991843
2005 -.6668508 .4308364 -1.55 0.124 -1.51763 .1839286
2006 -1.005305 .6448846 -1.56 0.121 -2.278769 .2681585
2007 -1.289379 .8527277 -1.51 0.132 -2.973274 .3945158
2008 -1.629466 1.066205 -1.53 0.128 -3.734919 .4759865
2009 -1.948954 1.278149 -1.52 0.129 -4.472935 .5750273

numpres_name
Alan Brinkley .0036891 .3190963 0.01 0.991 -.6264355 .6338137
Albert Carnesale -2.214855 1.198038 -1.85 0.066 -4.58064 .1509296
Alfred H. Bloom 0 (omitted)
Alice P. Gast 4.363048 2.198851 1.98 0.049 .020943 8.705153
Amy Gutmann 1.679132 .4257749 3.94 0.000 .8383481 2.519917
Andrew A. Sorensen -2.924166 1.5872 -1.84 0.067 -6.058435 .2101028
Andrew T. Ford -3.672203 2.136217 -1.72 0.088 -7.890625 .5462188
Anthony G. Collins 0 (omitted)
Anthony W. Marx 3.533189 1.83793 1.92 0.056 -.0962005 7.162578
Arthur J. Rothkopf -6.234023 3.398193 -1.83 0.068 -12.94449 .4764428
Barbara R. Snyder 1.39824 .8002091 1.75 0.082 -.1819457 2.978425
Barry Mills 0 (omitted)
Benjamin Ladner 2.18662 .0519382 42.10 0.000 2.084057 2.289183
Beverly Daniel Tatum 0 (omitted)
Brady J. Deaton 0 (omitted)
Brian C. Mitchell -1.635248 .9472556 -1.73 0.086 -3.505809 .2353122
Brian C. Rosenberg 2.882257 1.568548 1.84 0.068 -.2151794 5.979693
C. Dan Mote Jr. -2.568614 1.212412 -2.12 0.036 -4.962783 -.1744455
Carol T. Christ 0 (omitted)
Carolyn (Biddy) Martin 4.01198 2.065557 1.94 0.054 -.066908 8.090869
Catharine Bond Hill 6.581502 3.463508 1.90 0.059 -.2579414 13.42095
Charles E. Phelps 1.697419 .9508153 1.79 0.076 -.1801711 3.575009
Charles J. Dougherty 0 (omitted)
Charles M. Vest -1.857652 .8433866 -2.20 0.029 -3.523101 -.192203
Charles W. Steger 0 (omitted)
Colin S. Diver 0 (omitted)
Constantine N. Papadakis 0 (omitted)
Cornelius M. Kerwin .3617723 .0794129 4.56 0.000 .2049545 .5185901

As you can see many of the president dummies (I haven't included all of them) didn't drop out, and the presidential time invariants ( sex and clergy) didn't drop out

Now I try with a different panelvar setting:

xtset numpres_name year, yearly
sort numpres_name year
xtreg lnprescomp L.centered_rank L.cenpublicrank age sex termlength priorpres yearsprior clergy lnFTsal lnFTE lngifts lnendxstud satavg lnresearch resuni i.year i.unitid, fe robust cluster(numpres_name)

Fixed-effects (within) regression Number of obs = 730
Group variable: numpres_name Number of groups = 211

R-sq: Obs per group:
within = 0.5344 min = 1
between = 0.0510 avg = 3.5
overall = 0.0263 max = 7

F(15,210) = .
corr(u_i, Xb) = -0.9658 Prob > F = .

(Std. Err. adjusted for 211 clusters in numpres_name)
-------------------------------------------------------------------------------
| Robust
lnprescomp | Coef. Std. Err. t P>|t| [95% Conf. Interval]
--------------+----------------------------------------------------------------
centered_rank |
L1. | .0016453 .0028665 0.57 0.567 -.0040055 .007296
|
cenpublicrank |
L1. | -.0088242 .0051241 -1.72 0.087 -.0189255 .001277
|
age | .3248831 .194344 1.67 0.096 -.058232 .7079983
sex | 0 (omitted)
termlength | -.2341882 .1916804 -1.22 0.223 -.6120525 .1436761
priorpres | .0583441 .0351541 1.66 0.098 -.0109561 .1276443
yearsprior | -.0835124 .1507053 -0.55 0.580 -.3806015 .2135767
clergy | 0 (omitted)
lnFTsal | .3276799 .1651331 1.98 0.049 .0021489 .6532108
lnFTE | -.5449875 .388204 -1.40 0.162 -1.310264 .2202886
lngifts | .0004754 .0232352 0.02 0.984 -.0453287 .0462795
lnendxstud | -.0607394 .0844755 -0.72 0.473 -.2272681 .1057892
satavg | -.0002208 .000712 -0.31 0.757 -.0016243 .0011827
lnresearch | .0168454 .0244264 0.69 0.491 -.0313068 .0649977
resuni | 0 (omitted)
|
year |
2004 | -.0075971 .0227469 -0.33 0.739 -.0524386 .0372445
2005 | -.0174238 .0325207 -0.54 0.593 -.0815327 .0466851
2006 | -.0277991 .0296492 -0.94 0.350 -.0862472 .0306491
2007 | -.0061729 .0309894 -0.20 0.842 -.0672631 .0549173
2008 | -.0089648 .0293319 -0.31 0.760 -.0667875 .048858
2009 | 0 (omitted)
|
unitid |
100830 | 0 (omitted)
104179 | 0 (omitted)
106397 | 0 (omitted)
110404 | 0 (omitted)
110635 | 0 (omitted)
110644 | 0 (omitted)
110653 | 0 (omitted)
110662 | 0 (omitted)
110671 | -.0678122 .2204147 -0.31 0.759 -.5023211 .3666967
110680 | 0 (omitted)
110705 | 0 (omitted)
110714 | 0 (omitted)
112260 | 0 (omitted)
115409 | 0 (omitted)
118888 | 0 (omitted)
120254 | 0 (omitted)
120883 | 0 (omitted)
121345 | 0 (omitted)
123961 | 0 (omitted)
126678 | 0 (omitted)
129020 | 0 (omitted)
130590 | 0 (omitted)
130697 | 0 (omitted)
130794 | 0 (omitted)
130943 | 0 (omitted)
131159 | 0 (omitted)
131283 | 0 (omitted)
131496 | 0 (omitted)
131520 | 0 (omitted)

now the time invariant characteristics for both panel clusters drop, but now a year dummy is omitted for collinearity and it seems I've lost observations for that year (and a lot of statistical power).
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#5

18 Dec 2016, 10:31

Well, the data example you posted isn't complete enough for anyone to be able to try to reproduce your problem and then troubleshoot it. It excludes most of the variables, and to the extent that it contains more than one observation per unitid, those observations are all exact duplicates. So it isn't very helpful for this purpose.

That said, there are some hints that can be gleaned from the output and from general principles.

Stata has a routine for identifying colinear variables and removing them from the analysis, and I have never known that routine to get it wrong. So if Stata isn't dropping president name indicators when you -xtset unitid year-, it means that at least some of the unitid's (which I'm guessing are colleges?) have different presidents in different years. That certainly seems to make sense, and I doubt there is much, if anything, more to it than that. Put more briefly: president is not a time invariant attribute of a college. So it does not get omitted, and neither do fixed attributes associated with it like sex or clergy.

In the second analysis you use -xtset npresnum year-, and this time you see that sex and clergy are time invariant attributes of the president, so they do get omitted. But now you are troubled by the results for the unitid and year indicators. Most of the unitid indicators are omitted. But one is not:110671. The conclusion I draw from that is that somewhere in your data, there is one president who was president at two different colleges during the timespan covered by your data (again, not surprising in the real world) and one of those colleges was 110671. To chase that down I would -tab pres_name if unitid == 110671- and then run some more tabulate commands to see what colleges each of those presidents served at. Presumably for most of them, there will be only one such college, but one of them will show up twice or more. Then you have to figure out if that is what really happened in the world or if this is an error in your data.

The disappearance of one of your year indicators does not have an obvious explanation, but if I understood what all of the regression variables mean, we might be able to pin that one down too. Here's the general principle that is probably at work. Suppose you have panel data and you run a panel regression including year indicators. Suppose the model also includes another variable which, in effect divides time into two different eras. For example, you might have a variable that distinguishes years after 2008 from years up to 2008, or a variable that distinguishes election years from non-election years or something like that. When you include such a variable in the model, you introduce colinearity between the year indicators and that distinguishing variable, so Stata will either omit the distinguishing variable or will omit one of the year indicators. In your case it did the latter. Now, I cannot discern which variable in your model is the source of this, but if you think about this you can probably figure out which one it is. If you can't figure that out (perhaps because it's not supposed to happen but has arisen due to data errors), you can find the source of the colinearity by creating your own indicator variable for year 2009 and using that as the dependent variable in a regression on the other model variables.

Finally, it looks to me as if you are working with data that is a bit more complex than simply panel data. With the possible exception of one president who may have served at more than one unit it looks like your data has a multi-level structure with yearly observations nested within presidents nested within unitids. If the anomaly with some president being found in more than one unitid is due to a data error, then this is clearly the case. If that anomaly represents reality then you have something very close to a nested model: it is a rather sparse multiple-membership model. Either way, it would not be appropriate to treat this as panel data with the unitid as the panel because that fails to account for the non-independence of observations within presidents. It IS okay to treat it as panel data with the president as the panel ID, but then all fixed unitid attributes' effects become unestimable in a fixed effects model. You may want to look into doing this with multi-level modeling and mixed effects using the -mixed- command and representing the full hierarchy of observations within presidents within unitids.
1 like
Comment

Philip Gigliotti

Join Date: Nov 2016
Posts: 118

23 Dec 2016, 21:13

I'm still struggling with a few issues in this model. I'm using the areg command now rather than xtreg.

The command for the first regression is areg lnprescomp cenlagrank privateXrank private age yearsprior termlength lnendxstud lnresearch satavg lnFTsal lnFTE lngiftsxstud lnendxstud sex lac priorpres clergy final system year0*, absorb(unitid) robust cluster(unitid)

The command for the second regression is areg lnprescomp cenlagrank privateXrank private age yearsprior termlength lnendxstud lnresearch satavg lnFTsal lnFTE lngiftsxstud lnendxstud sex lac priorpres clergy final system year0* i.numpres_name, absorb(unitid) robust cluster(unitid)

The command for the third regression is areg lnprescomp lagrank private age yearsprior termlength lnendxstud lnresearch satavg lnFTsal lnFTE lngiftsxstud lnendxstud sex lac priorpres clergy final system year0* i.unitid, absorb(numpres_name) robust cluster(unitid).

The problem comes in when I run the second and third models. The second model is a fixed effect model with the absorb variable set to unitid, and dummies for numpres_name. The third is a fixed effects model with the absorb variable set to numpres_name and dummies for unitid. Though these models should be equivalent, and are for most variables, some of the variables diverge and I don't know why. In the third model, time invariant characteristics for both the unitid variable and the numpres_name variable drop out, but in the second, only the unitid time invariant variables drop out, and the numpres_name variables have strange coefficients (eg. sex). the age and yearsprior variables also have clashing signs, coefficient sizes or standard errors. I know the age yearsprior and sex variables don't have any errors in them. The coefficient in the second model is also markedly bigger than the first and third.

here are the models.

	(1)	(2)	(3)
VARIABLES	lnprescomp	lnprescomp	lnprescomp

cenlagrank	-0.0124***	-0.0104**	-0.0104**
	(0.00467)	(0.00449)	(0.00449)
privateXrank	0.0138**	0.0134*	0.0134*
	(0.00686)	(0.00727)	(0.00727)
o.private	-	-

age	-0.000346	-0.0666***	0.0675***
	(0.00504)	(0.00913)	(0.0230)
yearsprior	-0.00434	0.392***	0.0459***
	(0.0124)	(0.105)	(0.0172)
termlength	0.0145**	0.0200	0.0200
	(0.00560)	(0.0184)	(0.0184)
lnresearch	0.0258	0.0183	0.0183
	(0.0254)	(0.0263)	(0.0263)
satavg	0.000418	-0.000167	-0.000167
	(0.000768)	(0.000738)	(0.000738)
lnFTsal	0.595**	0.506	0.506
	(0.236)	(0.310)	(0.310)
lnFTE	-0.0729	-0.605	-0.605
	(0.316)	(0.422)	(0.422)
lngiftsxstud	-0.00606	-0.0197	-0.0197
	(0.0305)	(0.0407)	(0.0407)
lnendxstud	0.00913	-0.0596	-0.0596
	(0.0769)	(0.0930)	(0.0930)
sex	0.0964	3.795***
	(0.0723)	(1.055)
o.lac	-	-	-

priorpres	0.200*	0.0607	0.0607
	(0.102)	(0.0399)	(0.0399)
clergy	-0.0650	0.623***
	(0.356)	(0.181)
final	0.0534	0.0832	0.0832
	(0.0495)	(0.0586)	(0.0586)
o.system	-	-	-

Constant	6.129*	16.35***	8.834***
	(3.466)	(4.285)	(3.285)

Observations	863	863	863
R-squared	0.815	0.881	0.881

year fe	x	x	x
institution fe	x	x	x
president fe		x	x
absorb variable	unitid	unitid	president
Robust standard errors in parentheses
* p<0.01, p<0.05, * p<0.1

any ideas?

Last edited by Philip Gigliotti; 23 Dec 2016, 21:27.

Comment

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#7

24 Dec 2016, 11:33

Philip:
as an aside to others' superb advice, I would also consider if the number of your predictors is in line with the number of the observations totalled by groups.
I would probably go for more parsimoniuos regression models.

Kind regards,
Carlo
(Stata 19.0)
Comment
Philip Gigliotti

Join Date: Nov 2016

Posts: 118
#8

24 Dec 2016, 15:21

Originally posted by Carlo Lazzaro View Post

Philip:
as an aside to others' superb advice, I would also consider if the number of your predictors is in line with the number of the observations totalled by groups.
I would probably go for more parsimoniuos regression models.

A lot of the dummy variables drop out, since the president dummy is collinear with the university dummy Everytime a president holds his position for all 7 years of the panel. I have 863 observations and I would estimate I have about 400-500 degrees of freedom remaining after using both sets of dummies.

Certainly the model suffers from some over-fitting, but I still get a significant coefficient on my variable of interest and I think controlling for both president and institution characteristics goes a long way to making an argument for causality, at the very least as a robustness check.

im interested in knowing why the models are different, whether something is wrong, and which model to present as my model with president and university fixed effects.

Last edited by Philip Gigliotti; 24 Dec 2016, 15:36.
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#9

24 Dec 2016, 16:57

From help areg we read in the description

areg is designed for datasets with many groups, but not a number of groups that increases with the sample size. See the xtreg, fe command in [XT] xtreg for an estimator that handles the case in which the number of groups increases with the sample size.

and in the Technical Note found in the full documentation in the Stata Base Reference Manual PDF (included the Stata installation (since version 11) and accessible from within Stata - for example, through Stata's Help menu) we read

Although the point estimates produced by areg and xtreg, fe are the same, the estimated VCEs differ when vce(cluster clustvar) is specified because the commands make different assumptions about whether the number of groups increases with the sample size.

Both the number of institutions and the number of individuals who were identified as presidents depends on the size of your sample, so it seems to me that areg may not be an appropriate choice for your modeling, and from Clyde's advice, xtreg may not be, either.
Comment
Philip Gigliotti

Join Date: Nov 2016

Posts: 118
#10

24 Dec 2016, 17:16

My coauthor, who is a faculty member at another school and who is an expert in public policy econometrics, and I have settled on a fixed effects model. Using multilevel modelling is not widely accepted in our literature, and is not ideal for addressing endogeneity and making causal inferences, which is of primary concern in public policy and econometrics. In econ you either use fixed effects or instrumental variables, and we don't have an instrument.

I appreciate the feedback on model choice, but my interest in this model is not on the merits of the method chosen. I would like to know why two models that should be the same, a institution fe model with president dummies, and a president fe model with institution dummies, are producing different estimates for some, though not all of the coefficients. I need to know which specification is correct, or at the very least, which people think should be presented as my result.
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#11

24 Dec 2016, 18:44

The two results that you feel should be the same differ perhaps because you have chosen to fit your fixed effects model using areg, and Stata's documentation tells us clearly that your data do not meet the assumptions underlying the methodology, and recommends xtreg, fe for data that does not meet the areg assumptions.

To answer the question posed in your final sentence at #10, you have not demonstrated that the results differ when you estimate the model with an appropriate methodology. Nowhere have you explained why between posts #4 and #6 why you abandoned xtreg for areg.

Or maybe there's another answer. I copied and pasted your three commands from #6 into a CODE block (as described in the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post, in sections 9-12 on how to best pose your question.) My intent was to nudge you toward following those guidelines by showing how much easier it is to review code presented this way. But - now that it's readable - it leaps out that what you label as your third command in #6 is substantively different from your second command.

Code:

areg lnprescomp cenlagrank privateXrank private age yearsprior termlength lnendxstud lnresearch satavg lnFTsal lnFTE lngiftsxstud lnendxstud sex lac priorpres clergy final system year0*, absorb(unitid) robust cluster(unitid) areg lnprescomp cenlagrank privateXrank private age yearsprior termlength lnendxstud lnresearch satavg lnFTsal lnFTE lngiftsxstud lnendxstud sex lac priorpres clergy final system year0* i.numpres_name, absorb(unitid) robust cluster(unitid) areg lnprescomp lagrank private age yearsprior termlength lnendxstud lnresearch satavg lnFTsal lnFTE lngiftsxstud lnendxstud sex lac priorpres clergy final system year0* i.unitid, absorb(numpres_name) robust cluster(unitid).

And, having now looked closely at your variable names, I note a surprising number of embedded "x" characters. Please don't tell us you are manually creating interactions rather than relying on Stata's factor variable notation. You will find factor variable notation a powerful tool in your work. Do read help fvvarlist and the manual chapter linked therein. Your effort will be amply repaid. (See also help tsvarlist to learn how to simiarly avoid creating lagged variables.)

Last edited by William Lisowski; 24 Dec 2016, 18:55.
Comment
Philip Gigliotti

Join Date: Nov 2016

Posts: 118
#12

24 Dec 2016, 18:54

Originally posted by William Lisowski View Post

The two results that you feel should be the same differ perhaps because you have chosen to fit your fixed effects model using areg, and Stata's documentation tells us clearly that your data do not meet the assumptions underlying the methodology, and recommends xtreg, fe for data that does not meet the areg assumptions.

To answer the question posed in your final sentence at #10, you have not demonstrated that the results differ when you estimate the model with an appropriate methodology. Nowhere have you explained why between posts #4 and #6 why you abandoned xtreg for areg.

Or maybe there's another answer. I copied and pasted your three commands from #6 into a CODE block (as described in the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post, in sections 9-12 on how to best pose your question.) My intent was to nudge you toward following those guidelines by showing how much easier it is to review code presented this way. But - now that it's readable - it leaps out that what you label as your third command in #6 is substantively different from your second command.

Code:

areg lnprescomp cenlagrank privateXrank private age yearsprior termlength lnendxstud lnresearch satavg lnFTsal lnFTE lngiftsxstud lnendxstud sex lac priorpres clergy final system year0*, absorb(unitid) robust cluster(unitid) areg lnprescomp cenlagrank privateXrank private age yearsprior termlength lnendxstud lnresearch satavg lnFTsal lnFTE lngiftsxstud lnendxstud sex lac priorpres clergy final system year0* i.numpres_name, absorb(unitid) robust cluster(unitid) areg lnprescomp lagrank private age yearsprior termlength lnendxstud lnresearch satavg lnFTsal lnFTE lngiftsxstud lnendxstud sex lac priorpres clergy final system year0* i.unitid, absorb(numpres_name) robust cluster(unitid).

And, having now looked closely at your variable names, I note a surprising number of embedded "x" characters. Please don't tell us you are manually creating interactions rather than relying on Stata's factor variable notation. You will find factor variable notation a powerful tool in your work. Do read help fvvarlist and the manual chapter linked therein. Your effort will be amply repaid.

I switched to areg because my coauthor prefers it, but considers them interchangeable.

The line of command for the third model is wrong. The only change from the second model is that the absorb command has changed from unitid to numpres_name and I am now using unitid dummies instead of numpres_name dummies.

As you can see the coefficients are the same in most of the variables. I don't understand why they are not the same for all of the variables.
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#13

24 Dec 2016, 19:07

Perhaps it's time to confirm they're interchangeable by fitting models 2 and 3 with xtreg, fe.
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4420
#14

24 Dec 2016, 19:26

Originally posted by Philip Gigliotti View Post

The problem comes in when I run the second and third models. The second model is a fixed effect model with the absorb variable set to unitid, and dummies for numpres_name. The third is a fixed effects model with the absorb variable set to numpres_name and dummies for unitid. Though these models should be equivalent, and are for most variables, some of the variables diverge and I don't know why. In the third model, time invariant characteristics for both the unitid variable and the numpres_name variable drop out, but in the second, only the unitid time invariant variables drop out, and the numpres_name variables have strange coefficients (eg. sex). the age and yearsprior variables also have clashing signs, coefficient sizes or standard errors.

Isn't this because once a couple of colleges get new presidents with different sex or clerical status during the interval, then sex, clerical status become no longer time-invariant for colleges and remain in that fixed-effects model? The converse is not true, and so you will have those coefficients in the second model and not in the third. The presence of those additional explanatory variables in the second model / absence in the third naturally affect at least some of the other regression coefficients. Are you saying that there's something going on here beyond that?
1 like
Comment

Philip Gigliotti

Join Date: Nov 2016
Posts: 118

#15

24 Dec 2016, 19:32

the commands I used are

Code:

areg lnprescomp cenlagrank privateXrank private  age yearsprior termlength lnresearch satavg lnFTsal lnFTE lngiftsxstud lnendxstud sex lac priorpres clergy final system year0* , absorb(unitid) robust cluster(unitid)

areg lnprescomp cenlagrank privateXrank private  age yearsprior termlength lnresearch satavg lnFTsal lnFTE lngiftsxstud lnendxstud sex lac priorpres clergy final system year0* i.numpres_name, absorb(unitid) robust cluster(unitid)

areg lnprescomp cenlagrank privateXrank private  age yearsprior termlength lnresearch satavg lnFTsal lnFTE lngiftsxstud lnendxstud sex lac priorpres clergy final system year0* i.unitid, absorb(numpres_name) robust cluster(unitid)

They produce the following results.


	(1)	(2)	(3)
VARIABLES	lnprescomp	lnprescomp	lnprescomp

cenlagrank	-0.0124***	-0.0104**	-0.0104**
	(0.00467)	(0.00449)	(0.00449)
privateXrank	0.0138**	0.0134*	0.0134*
	(0.00686)	(0.00727)	(0.00727)
o.private	-	-

age	-0.000346	-0.0666***	0.0675***
	(0.00504)	(0.00913)	(0.0230)
yearsprior	-0.00434	0.392***	0.0459***
	(0.0124)	(0.105)	(0.0172)
termlength	0.0145**	0.0200	0.0200
	(0.00560)	(0.0184)	(0.0184)
lnresearch	0.0258	0.0183	0.0183
	(0.0254)	(0.0263)	(0.0263)
satavg	0.000418	-0.000167	-0.000167
	(0.000768)	(0.000738)	(0.000738)
lnFTsal	0.595**	0.506	0.506
	(0.236)	(0.310)	(0.310)
lnFTE	-0.0729	-0.605	-0.605
	(0.316)	(0.422)	(0.422)
lngiftsxstud	-0.00606	-0.0197	-0.0197
	(0.0305)	(0.0407)	(0.0407)
lnendxstud	0.00913	-0.0596	-0.0596
	(0.0769)	(0.0930)	(0.0930)
sex	0.0964	3.795***
	(0.0723)	(1.055)
o.lac	-	-	-

priorpres	0.200*	0.0607	0.0607
	(0.102)	(0.0399)	(0.0399)
clergy	-0.0650	0.623***
	(0.356)	(0.181)
final	0.0534	0.0832	0.0832
	(0.0495)	(0.0586)	(0.0586)
o.system	-	-	-

Constant	6.129*	16.35***	8.834***
	(3.466)	(4.285)	(3.285)

Observations	863	863	863
R-squared	0.815	0.881	0.881
Robust standard errors in parentheses
* p<0.01, p<0.05, * p<0.1

Using xtreg produces the following results:

Code:

xtset unitid year
xtreg lnprescomp cenlagrank privateXrank private  age yearsprior termlength lnresearch satavg lnFTsal lnFTE lngiftsxstud lnendxstud sex lac priorpres clergy final system year0*,fe robust cluster(unitid)

xtreg lnprescomp cenlagrank privateXrank private  age yearsprior termlength lnresearch satavg lnFTsal lnFTE lngiftsxstud lnendxstud sex lac priorpres clergy final system year0* i.numpres_name,fe robust cluster(unitid)

xtset numpres_name year
xtreg lnprescomp cenlagrank privateXrank private  age yearsprior termlength lnresearch satavg lnFTsal lnFTE lngiftsxstud lnendxstud sex lac priorpres clergy final system year0* i.unitid,fe robust cluster(numpres_name)

	(1)	(2)	(3)
VARIABLES	lnprescomp	lnprescomp	lnprescomp

cenlagrank	-0.0124***	-0.0104***	-0.0104***
	(0.00419)	(0.00398)	(0.00373)
privateXrank	0.0138**	0.0134**	0.0134**
	(0.00616)	(0.00644)	(0.00610)
o.private	-	-

age	-0.000346	-0.0666***	0.0835***
	(0.00453)	(0.00810)	(0.0318)
yearsprior	-0.00434	0.392***	0.150**
	(0.0111)	(0.0934)	(0.0676)
termlength	0.0145***	0.0200	0.0200
	(0.00504)	(0.0163)	(0.0156)
lnresearch	0.0258	0.0183	0.0183
	(0.0228)	(0.0233)	(0.0220)
satavg	0.000418	-0.000167	-0.000167
	(0.000690)	(0.000654)	(0.000617)
lnFTsal	0.595***	0.506*	0.506*
	(0.212)	(0.275)	(0.260)
lnFTE	-0.0729	-0.605	-0.605*
	(0.284)	(0.374)	(0.354)
lngiftsxstud	-0.00606	-0.0197	-0.0197
	(0.0274)	(0.0361)	(0.0340)
lnendxstud	0.00913	-0.0596	-0.0596
	(0.0691)	(0.0824)	(0.0791)
sex	0.0964	3.795***
	(0.0650)	(0.935)
o.lac	-	-	-

priorpres	0.200**	0.0607*	0.0607*
	(0.0921)	(0.0353)	(0.0332)
clergy	-0.0650	0.623***
	(0.320)	(0.160)
final	0.0534	0.0832	0.0832*
	(0.0445)	(0.0519)	(0.0488)
o.system	-	-

Constant	6.129*	16.35***	7.908**
	(3.114)	(3.799)	(3.856)

Observations	863	863	863
R-squared	0.414	0.624	0.509
Number of unitid	163	163
Number of numpres_name			245
Robust standard errors in parentheses
* p<0.01, p<0.05, * p<0.1

The results are similar, though the standard errors are smaller in the xtreg model. I think this is why my coauthor believes areg is more rigorous. As you can see the the same pattern persists in both. the coefficients are the same in the second and third model for most variables except for age, yearsprior, sex and clergy, with the model in which the panel variable is set to numpres_name appearing more accurate.

Announcement