Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problem with fixed effects (time invariant dummies not dropping out)

    I'm running a fixed effects model and for some reason my time invariant variables aren't dropping out. These variables are coded 0 or 1. I only have about 1000 observations so I've been sorting them by unit and the dummy and looking for miscoded variables but I can't find any.

    The problem might be that I am using two kinds of fixed effects. One for university and one for college president. More than one of the college president dummies are dropping out for multicollinearity. Could this be the problem? And why is this happening?

  • #2
    Philip:
    it's difficult to guess what's going on without seeing an example/excerpt of your dataset, that you can easily post via -search dataex-. Thanks.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      It would be easier to tell if we could see the command line as you have typed it in Stata. That said, it seems that you have used the regress command instead of xtreg. Some variables are dropped due to collinearity and Stata does not really care which variables to drop. Your model is not identified and the estimates of your time-invariant variables are confounded with the effects of the omitted dummies. You will probably observe that your dummies are no longer omitted if you manually drop the other time-invariant variables.
      https://www.kripfganz.de/stata/

      Comment


      • #4
        Ok, here is a data sample.

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input double unitid str80 pres_name long numpres_name byte(sex clergy)
        149781 "A. Duane Litfin"       1 0 1
        149781 "A. Duane Litfin"       1 0 1
        149781 "A. Duane Litfin"       1 0 1
        149781 "A. Duane Litfin"       1 0 1
        149781 "A. Duane Litfin"       1 0 1
        149781 "A. Duane Litfin"       1 0 1
        149781 "A. Duane Litfin"       1 0 1
        149781 "A. Duane Litfin"       1 0 1
        182670 "Adam M. Keller"        3 . .
        190150 "Alan Brinkley"         4 0 0
        110662 "Albert Carnesale"      5 0 0
        110662 "Albert Carnesale"      5 0 0
        216287 "Alfred H. Bloom"       6 0 0
        216287 "Alfred H. Bloom"       6 0 0
        216287 "Alfred H. Bloom"       6 0 0
        216287 "Alfred H. Bloom"       6 0 0
        216287 "Alfred H. Bloom"       6 0 0
        216287 "Alfred H. Bloom"       6 0 0
        216287 "Alfred H. Bloom"       6 0 0
        216287 "Alfred H. Bloom"       6 0 0
        213543 "Alice P. Gast"         8 1 0
        213543 "Alice P. Gast"         8 1 0
        213543 "Alice P. Gast"         8 1 0
        215062 "Amy Gutmann"           9 1 0
        215062 "Amy Gutmann"           9 1 0
        215062 "Amy Gutmann"           9 1 0
        215062 "Amy Gutmann"           9 1 0
        218663 "Andrew A. Sorensen"   10 0 0
        218663 "Andrew A. Sorensen"   10 0 0
        218663 "Andrew A. Sorensen"   10 0 0
        218663 "Andrew A. Sorensen"   10 0 0
        152673 "Andrew T. Ford"       12 0 0
        152673 "Andrew T. Ford"       12 0 0
        152673 "Andrew T. Ford"       12 0 0
        152673 "Andrew T. Ford"       12 0 0
        152673 "Andrew T. Ford"       12 0 0
        166027 "Ann E. Berman"        14 . .
        190044 "Anthony G. Collins"   15 0 0
        190044 "Anthony G. Collins"   15 0 0
        190044 "Anthony G. Collins"   15 0 0
        190044 "Anthony G. Collins"   15 0 0
        190044 "Anthony G. Collins"   15 0 0
        190044 "Anthony G. Collins"   15 0 0
        164465 "Anthony W. Marx"      16 0 0
        164465 "Anthony W. Marx"      16 0 0
        164465 "Anthony W. Marx"      16 0 0
        164465 "Anthony W. Marx"      16 0 0
        164465 "Anthony W. Marx"      16 0 0
        164465 "Anthony W. Marx"      16 0 0
        213385 "Arthur J. Rothkopf"   18 0 0
        213385 "Arthur J. Rothkopf"   18 0 0
        213385 "Arthur J. Rothkopf"   18 0 0
        213385 "Arthur J. Rothkopf"   18 0 0
        201645 "Barbara R. Snyder"    20 1 0
        201645 "Barbara R. Snyder"    20 1 0
        161004 "Barry Mills"          21 0 0
        161004 "Barry Mills"          21 0 0
        161004 "Barry Mills"          21 0 0
        161004 "Barry Mills"          21 0 0
        161004 "Barry Mills"          21 0 0
        161004 "Barry Mills"          21 0 0
        161004 "Barry Mills"          21 0 0
        161004 "Barry Mills"          21 0 0
        131159 "Benjamin Ladner"      22 0 0
        131159 "Benjamin Ladner"      22 0 0
        131159 "Benjamin Ladner"      22 0 0
        131159 "Benjamin Ladner"      22 0 0
        141060 "Beverly Daniel Tatum" 24 1 0
        141060 "Beverly Daniel Tatum" 24 1 0
        141060 "Beverly Daniel Tatum" 24 1 0
        141060 "Beverly Daniel Tatum" 24 1 0
        141060 "Beverly Daniel Tatum" 24 1 0
        141060 "Beverly Daniel Tatum" 24 1 0
        141060 "Beverly Daniel Tatum" 24 1 0
        178396 "Brady J. Deaton"      25 0 0
        178396 "Brady J. Deaton"      25 0 0
        178396 "Brady J. Deaton"      25 0 0
        178396 "Brady J. Deaton"      25 0 0
        178396 "Brady J. Deaton"      25 0 0
        216667 "Brian C. Mitchell"    26 0 0
        216667 "Brian C. Mitchell"    26 0 0
        216667 "Brian C. Mitchell"    26 0 0
        211291 "Brian C. Mitchell"    26 0 0
        211291 "Brian C. Mitchell"    26 0 0
        211291 "Brian C. Mitchell"    26 0 0
        211291 "Brian C. Mitchell"    26 0 0
        211291 "Brian C. Mitchell"    26 0 0
        173902 "Brian C. Rosenberg"   28 0 0
        173902 "Brian C. Rosenberg"   28 0 0
        173902 "Brian C. Rosenberg"   28 0 0
        173902 "Brian C. Rosenberg"   28 0 0
        173902 "Brian C. Rosenberg"   28 0 0
        173902 "Brian C. Rosenberg"   28 0 0
        163286 "C. Dan Mote Jr."      29 0 0
        163286 "C. Dan Mote Jr."      29 0 0
        163286 "C. Dan Mote Jr."      29 0 0
        163286 "C. Dan Mote Jr."      29 0 0
        167835 "Carol T. Christ"      31 1 0
        167835 "Carol T. Christ"      31 1 0
        167835 "Carol T. Christ"      31 1 0
        end
        label values numpres_name numpres_name
        label def numpres_name 1 "A. Duane Litfin", modify
        label def numpres_name 3 "Adam M. Keller", modify
        label def numpres_name 4 "Alan Brinkley", modify
        label def numpres_name 5 "Albert Carnesale", modify
        label def numpres_name 6 "Alfred H. Bloom", modify
        label def numpres_name 8 "Alice P. Gast", modify
        label def numpres_name 9 "Amy Gutmann", modify
        label def numpres_name 10 "Andrew A. Sorensen", modify
        label def numpres_name 12 "Andrew T. Ford", modify
        label def numpres_name 14 "Ann E. Berman", modify
        label def numpres_name 15 "Anthony G. Collins", modify
        label def numpres_name 16 "Anthony W. Marx", modify
        label def numpres_name 18 "Arthur J. Rothkopf", modify
        label def numpres_name 20 "Barbara R. Snyder", modify
        label def numpres_name 21 "Barry Mills", modify
        label def numpres_name 22 "Benjamin Ladner", modify
        label def numpres_name 24 "Beverly Daniel Tatum", modify
        label def numpres_name 25 "Brady J. Deaton", modify
        label def numpres_name 26 "Brian C. Mitchell", modify
        label def numpres_name 28 "Brian C. Rosenberg", modify
        label def numpres_name 29 "C. Dan Mote Jr.", modify
        label def numpres_name 31 "Carol T. Christ", modify


        here is the code for one version of my regression

        xtset unitid year, yearly
        sort unitid year
        xtreg lnprescomp L.centered_rank L.cenpublicrank age sex termlength priorpres yearsprior clergy lnFTsal lnFTE lngifts lnendxstud satavg lnresearch resuni i.year i.numpres_name, fe robust cluster (unitid)


        Fixed-effects (within) regression Number of obs = 863
        Group variable: unitid Number of groups = 163

        R-sq: Obs per group:
        within = 0.6222 min = 2
        between = 0.0201 avg = 5.3
        overall = 0.0044 max = 7

        F(17,162) = .
        corr(u_i, Xb) = -0.9844 Prob > F = .

        (Std. Err. adjusted for 163 clusters in unitid)

        Robust
        lnprescomp Coef. Std. Err. t P>t [95% Conf. Interval]

        centered_rank
        L1. .0027526 .0053348 0.52 0.607 -.0077821 .0132873

        cenpublicrank
        L1. -.0133977 .0064985 -2.06 0.041 -.0262304 -.0005649

        age .1340173 .0773559 1.73 0.085 -.0187386 .2867732
        sex -.7377239 .4265682 -1.73 0.086 -1.580075 .104627
        termlength .2822993 .1350666 2.09 0.038 .0155812 .5490173
        priorpres .0634937 .0350573 1.81 0.072 -.0057345 .132722
        yearsprior -.1406934 .0775754 -1.81 0.072 -.2938829 .012496
        clergy -2.576749 1.569818 -1.64 0.103 -5.676694 .5231947
        lnFTsal .4982687 .2711326 1.84 0.068 -.0371411 1.033679
        lnFTE -.7239067 .3974517 -1.82 0.070 -1.508761 .0609474
        lngifts -.0251121 .0359195 -0.70 0.485 -.096043 .0458188
        lnendxstud -.0760739 .0838705 -0.91 0.366 -.2416943 .0895466
        satavg -.0001304 .0006751 -0.19 0.847 -.0014635 .0012027
        lnresearch .0167538 .0243796 0.69 0.493 -.031389 .0648966
        resuni 0 (omitted)

        year
        2004 -.3278364 .2162441 -1.52 0.131 -.7548571 .0991843
        2005 -.6668508 .4308364 -1.55 0.124 -1.51763 .1839286
        2006 -1.005305 .6448846 -1.56 0.121 -2.278769 .2681585
        2007 -1.289379 .8527277 -1.51 0.132 -2.973274 .3945158
        2008 -1.629466 1.066205 -1.53 0.128 -3.734919 .4759865
        2009 -1.948954 1.278149 -1.52 0.129 -4.472935 .5750273

        numpres_name
        Alan Brinkley .0036891 .3190963 0.01 0.991 -.6264355 .6338137
        Albert Carnesale -2.214855 1.198038 -1.85 0.066 -4.58064 .1509296
        Alfred H. Bloom 0 (omitted)
        Alice P. Gast 4.363048 2.198851 1.98 0.049 .020943 8.705153
        Amy Gutmann 1.679132 .4257749 3.94 0.000 .8383481 2.519917
        Andrew A. Sorensen -2.924166 1.5872 -1.84 0.067 -6.058435 .2101028
        Andrew T. Ford -3.672203 2.136217 -1.72 0.088 -7.890625 .5462188
        Anthony G. Collins 0 (omitted)
        Anthony W. Marx 3.533189 1.83793 1.92 0.056 -.0962005 7.162578
        Arthur J. Rothkopf -6.234023 3.398193 -1.83 0.068 -12.94449 .4764428
        Barbara R. Snyder 1.39824 .8002091 1.75 0.082 -.1819457 2.978425
        Barry Mills 0 (omitted)
        Benjamin Ladner 2.18662 .0519382 42.10 0.000 2.084057 2.289183
        Beverly Daniel Tatum 0 (omitted)
        Brady J. Deaton 0 (omitted)
        Brian C. Mitchell -1.635248 .9472556 -1.73 0.086 -3.505809 .2353122
        Brian C. Rosenberg 2.882257 1.568548 1.84 0.068 -.2151794 5.979693
        C. Dan Mote Jr. -2.568614 1.212412 -2.12 0.036 -4.962783 -.1744455
        Carol T. Christ 0 (omitted)
        Carolyn (Biddy) Martin 4.01198 2.065557 1.94 0.054 -.066908 8.090869
        Catharine Bond Hill 6.581502 3.463508 1.90 0.059 -.2579414 13.42095
        Charles E. Phelps 1.697419 .9508153 1.79 0.076 -.1801711 3.575009
        Charles J. Dougherty 0 (omitted)
        Charles M. Vest -1.857652 .8433866 -2.20 0.029 -3.523101 -.192203
        Charles W. Steger 0 (omitted)
        Colin S. Diver 0 (omitted)
        Constantine N. Papadakis 0 (omitted)
        Cornelius M. Kerwin .3617723 .0794129 4.56 0.000 .2049545 .5185901

        As you can see many of the president dummies (I haven't included all of them) didn't drop out, and the presidential time invariants ( sex and clergy) didn't drop out

        Now I try with a different panelvar setting:

        xtset numpres_name year, yearly
        sort numpres_name year
        xtreg lnprescomp L.centered_rank L.cenpublicrank age sex termlength priorpres yearsprior clergy lnFTsal lnFTE lngifts lnendxstud satavg lnresearch resuni i.year i.unitid, fe robust cluster(numpres_name)


        Fixed-effects (within) regression Number of obs = 730
        Group variable: numpres_name Number of groups = 211

        R-sq: Obs per group:
        within = 0.5344 min = 1
        between = 0.0510 avg = 3.5
        overall = 0.0263 max = 7

        F(15,210) = .
        corr(u_i, Xb) = -0.9658 Prob > F = .

        (Std. Err. adjusted for 211 clusters in numpres_name)
        -------------------------------------------------------------------------------
        | Robust
        lnprescomp | Coef. Std. Err. t P>|t| [95% Conf. Interval]
        --------------+----------------------------------------------------------------
        centered_rank |
        L1. | .0016453 .0028665 0.57 0.567 -.0040055 .007296
        |
        cenpublicrank |
        L1. | -.0088242 .0051241 -1.72 0.087 -.0189255 .001277
        |
        age | .3248831 .194344 1.67 0.096 -.058232 .7079983
        sex | 0 (omitted)
        termlength | -.2341882 .1916804 -1.22 0.223 -.6120525 .1436761
        priorpres | .0583441 .0351541 1.66 0.098 -.0109561 .1276443
        yearsprior | -.0835124 .1507053 -0.55 0.580 -.3806015 .2135767
        clergy | 0 (omitted)
        lnFTsal | .3276799 .1651331 1.98 0.049 .0021489 .6532108
        lnFTE | -.5449875 .388204 -1.40 0.162 -1.310264 .2202886
        lngifts | .0004754 .0232352 0.02 0.984 -.0453287 .0462795
        lnendxstud | -.0607394 .0844755 -0.72 0.473 -.2272681 .1057892
        satavg | -.0002208 .000712 -0.31 0.757 -.0016243 .0011827
        lnresearch | .0168454 .0244264 0.69 0.491 -.0313068 .0649977
        resuni | 0 (omitted)
        |
        year |
        2004 | -.0075971 .0227469 -0.33 0.739 -.0524386 .0372445
        2005 | -.0174238 .0325207 -0.54 0.593 -.0815327 .0466851
        2006 | -.0277991 .0296492 -0.94 0.350 -.0862472 .0306491
        2007 | -.0061729 .0309894 -0.20 0.842 -.0672631 .0549173
        2008 | -.0089648 .0293319 -0.31 0.760 -.0667875 .048858
        2009 | 0 (omitted)
        |
        unitid |
        100830 | 0 (omitted)
        104179 | 0 (omitted)
        106397 | 0 (omitted)
        110404 | 0 (omitted)
        110635 | 0 (omitted)
        110644 | 0 (omitted)
        110653 | 0 (omitted)
        110662 | 0 (omitted)
        110671 | -.0678122 .2204147 -0.31 0.759 -.5023211 .3666967
        110680 | 0 (omitted)
        110705 | 0 (omitted)
        110714 | 0 (omitted)
        112260 | 0 (omitted)
        115409 | 0 (omitted)
        118888 | 0 (omitted)
        120254 | 0 (omitted)
        120883 | 0 (omitted)
        121345 | 0 (omitted)
        123961 | 0 (omitted)
        126678 | 0 (omitted)
        129020 | 0 (omitted)
        130590 | 0 (omitted)
        130697 | 0 (omitted)
        130794 | 0 (omitted)
        130943 | 0 (omitted)
        131159 | 0 (omitted)
        131283 | 0 (omitted)
        131496 | 0 (omitted)
        131520 | 0 (omitted)

        now the time invariant characteristics for both panel clusters drop, but now a year dummy is omitted for collinearity and it seems I've lost observations for that year (and a lot of statistical power).

        Comment


        • #5
          Well, the data example you posted isn't complete enough for anyone to be able to try to reproduce your problem and then troubleshoot it. It excludes most of the variables, and to the extent that it contains more than one observation per unitid, those observations are all exact duplicates. So it isn't very helpful for this purpose.

          That said, there are some hints that can be gleaned from the output and from general principles.

          Stata has a routine for identifying colinear variables and removing them from the analysis, and I have never known that routine to get it wrong. So if Stata isn't dropping president name indicators when you -xtset unitid year-, it means that at least some of the unitid's (which I'm guessing are colleges?) have different presidents in different years. That certainly seems to make sense, and I doubt there is much, if anything, more to it than that. Put more briefly: president is not a time invariant attribute of a college. So it does not get omitted, and neither do fixed attributes associated with it like sex or clergy.

          In the second analysis you use -xtset npresnum year-, and this time you see that sex and clergy are time invariant attributes of the president, so they do get omitted. But now you are troubled by the results for the unitid and year indicators. Most of the unitid indicators are omitted. But one is not:110671. The conclusion I draw from that is that somewhere in your data, there is one president who was president at two different colleges during the timespan covered by your data (again, not surprising in the real world) and one of those colleges was 110671. To chase that down I would -tab pres_name if unitid == 110671- and then run some more tabulate commands to see what colleges each of those presidents served at. Presumably for most of them, there will be only one such college, but one of them will show up twice or more. Then you have to figure out if that is what really happened in the world or if this is an error in your data.

          The disappearance of one of your year indicators does not have an obvious explanation, but if I understood what all of the regression variables mean, we might be able to pin that one down too. Here's the general principle that is probably at work. Suppose you have panel data and you run a panel regression including year indicators. Suppose the model also includes another variable which, in effect divides time into two different eras. For example, you might have a variable that distinguishes years after 2008 from years up to 2008, or a variable that distinguishes election years from non-election years or something like that. When you include such a variable in the model, you introduce colinearity between the year indicators and that distinguishing variable, so Stata will either omit the distinguishing variable or will omit one of the year indicators. In your case it did the latter. Now, I cannot discern which variable in your model is the source of this, but if you think about this you can probably figure out which one it is. If you can't figure that out (perhaps because it's not supposed to happen but has arisen due to data errors), you can find the source of the colinearity by creating your own indicator variable for year 2009 and using that as the dependent variable in a regression on the other model variables.

          Finally, it looks to me as if you are working with data that is a bit more complex than simply panel data. With the possible exception of one president who may have served at more than one unit it looks like your data has a multi-level structure with yearly observations nested within presidents nested within unitids. If the anomaly with some president being found in more than one unitid is due to a data error, then this is clearly the case. If that anomaly represents reality then you have something very close to a nested model: it is a rather sparse multiple-membership model. Either way, it would not be appropriate to treat this as panel data with the unitid as the panel because that fails to account for the non-independence of observations within presidents. It IS okay to treat it as panel data with the president as the panel ID, but then all fixed unitid attributes' effects become unestimable in a fixed effects model. You may want to look into doing this with multi-level modeling and mixed effects using the -mixed- command and representing the full hierarchy of observations within presidents within unitids.



          Comment


          • #6
            I'm still struggling with a few issues in this model. I'm using the areg command now rather than xtreg.

            The command for the first regression is areg lnprescomp cenlagrank privateXrank private age yearsprior termlength lnendxstud lnresearch satavg lnFTsal lnFTE lngiftsxstud lnendxstud sex lac priorpres clergy final system year0*, absorb(unitid) robust cluster(unitid)

            The command for the second regression is areg lnprescomp cenlagrank privateXrank private age yearsprior termlength lnendxstud lnresearch satavg lnFTsal lnFTE lngiftsxstud lnendxstud sex lac priorpres clergy final system year0* i.numpres_name, absorb(unitid) robust cluster(unitid)

            The command for the third regression is areg lnprescomp lagrank private age yearsprior termlength lnendxstud lnresearch satavg lnFTsal lnFTE lngiftsxstud lnendxstud sex lac priorpres clergy final system year0* i.unitid, absorb(numpres_name) robust cluster(unitid).

            The problem comes in when I run the second and third models. The second model is a fixed effect model with the absorb variable set to unitid, and dummies for numpres_name. The third is a fixed effects model with the absorb variable set to numpres_name and dummies for unitid. Though these models should be equivalent, and are for most variables, some of the variables diverge and I don't know why. In the third model, time invariant characteristics for both the unitid variable and the numpres_name variable drop out, but in the second, only the unitid time invariant variables drop out, and the numpres_name variables have strange coefficients (eg. sex). the age and yearsprior variables also have clashing signs, coefficient sizes or standard errors. I know the age yearsprior and sex variables don't have any errors in them. The coefficient in the second model is also markedly bigger than the first and third.

            here are the models.
            (1) (2) (3)
            VARIABLES lnprescomp lnprescomp lnprescomp
            cenlagrank -0.0124*** -0.0104** -0.0104**
            (0.00467) (0.00449) (0.00449)
            privateXrank 0.0138** 0.0134* 0.0134*
            (0.00686) (0.00727) (0.00727)
            o.private - -
            age -0.000346 -0.0666*** 0.0675***
            (0.00504) (0.00913) (0.0230)
            yearsprior -0.00434 0.392*** 0.0459***
            (0.0124) (0.105) (0.0172)
            termlength 0.0145** 0.0200 0.0200
            (0.00560) (0.0184) (0.0184)
            lnresearch 0.0258 0.0183 0.0183
            (0.0254) (0.0263) (0.0263)
            satavg 0.000418 -0.000167 -0.000167
            (0.000768) (0.000738) (0.000738)
            lnFTsal 0.595** 0.506 0.506
            (0.236) (0.310) (0.310)
            lnFTE -0.0729 -0.605 -0.605
            (0.316) (0.422) (0.422)
            lngiftsxstud -0.00606 -0.0197 -0.0197
            (0.0305) (0.0407) (0.0407)
            lnendxstud 0.00913 -0.0596 -0.0596
            (0.0769) (0.0930) (0.0930)
            sex 0.0964 3.795***
            (0.0723) (1.055)
            o.lac - - -
            priorpres 0.200* 0.0607 0.0607
            (0.102) (0.0399) (0.0399)
            clergy -0.0650 0.623***
            (0.356) (0.181)
            final 0.0534 0.0832 0.0832
            (0.0495) (0.0586) (0.0586)
            o.system - - -
            Constant 6.129* 16.35*** 8.834***
            (3.466) (4.285) (3.285)
            Observations 863 863 863
            R-squared 0.815 0.881 0.881
            year fe x x x
            institution fe x x x
            president fe x x
            absorb variable unitid unitid president
            Robust standard errors in parentheses
            *** p<0.01, ** p<0.05, * p<0.1
            any ideas?
            Last edited by Philip Gigliotti; 23 Dec 2016, 21:27.

            Comment


            • #7
              Philip:
              as an aside to others' superb advice, I would also consider if the number of your predictors is in line with the number of the observations totalled by groups.
              I would probably go for more parsimoniuos regression models.
              Kind regards,
              Carlo
              (Stata 19.0)

              Comment


              • #8
                Originally posted by Carlo Lazzaro View Post
                Philip:
                as an aside to others' superb advice, I would also consider if the number of your predictors is in line with the number of the observations totalled by groups.
                I would probably go for more parsimoniuos regression models.
                A lot of the dummy variables drop out, since the president dummy is collinear with the university dummy Everytime a president holds his position for all 7 years of the panel. I have 863 observations and I would estimate I have about 400-500 degrees of freedom remaining after using both sets of dummies.

                Certainly the model suffers from some over-fitting, but I still get a significant coefficient on my variable of interest and I think controlling for both president and institution characteristics goes a long way to making an argument for causality, at the very least as a robustness check.

                im interested in knowing why the models are different, whether something is wrong, and which model to present as my model with president and university fixed effects.
                Last edited by Philip Gigliotti; 24 Dec 2016, 15:36.

                Comment


                • #9
                  From help areg we read in the description

                  areg is designed for datasets with many groups, but not a number of groups that increases with the sample size. See the xtreg, fe command in [XT] xtreg for an estimator that handles the case in which the number of groups increases with the sample size.
                  and in the Technical Note found in the full documentation in the Stata Base Reference Manual PDF (included the Stata installation (since version 11) and accessible from within Stata - for example, through Stata's Help menu) we read

                  Although the point estimates produced by areg and xtreg, fe are the same, the estimated VCEs differ when vce(cluster clustvar) is specified because the commands make different assumptions about whether the number of groups increases with the sample size.
                  Both the number of institutions and the number of individuals who were identified as presidents depends on the size of your sample, so it seems to me that areg may not be an appropriate choice for your modeling, and from Clyde's advice, xtreg may not be, either.

                  Comment


                  • #10
                    My coauthor, who is a faculty member at another school and who is an expert in public policy econometrics, and I have settled on a fixed effects model. Using multilevel modelling is not widely accepted in our literature, and is not ideal for addressing endogeneity and making causal inferences, which is of primary concern in public policy and econometrics. In econ you either use fixed effects or instrumental variables, and we don't have an instrument.

                    I appreciate the feedback on model choice, but my interest in this model is not on the merits of the method chosen. I would like to know why two models that should be the same, a institution fe model with president dummies, and a president fe model with institution dummies, are producing different estimates for some, though not all of the coefficients. I need to know which specification is correct, or at the very least, which people think should be presented as my result.

                    Comment


                    • #11
                      The two results that you feel should be the same differ perhaps because you have chosen to fit your fixed effects model using areg, and Stata's documentation tells us clearly that your data do not meet the assumptions underlying the methodology, and recommends xtreg, fe for data that does not meet the areg assumptions.

                      To answer the question posed in your final sentence at #10, you have not demonstrated that the results differ when you estimate the model with an appropriate methodology. Nowhere have you explained why between posts #4 and #6 why you abandoned xtreg for areg.

                      Or maybe there's another answer. I copied and pasted your three commands from #6 into a CODE block (as described in the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post, in sections 9-12 on how to best pose your question.) My intent was to nudge you toward following those guidelines by showing how much easier it is to review code presented this way. But - now that it's readable - it leaps out that what you label as your third command in #6 is substantively different from your second command.
                      Code:
                      areg lnprescomp cenlagrank privateXrank private age yearsprior termlength lnendxstud lnresearch satavg lnFTsal lnFTE lngiftsxstud lnendxstud sex lac priorpres clergy final system year0*, absorb(unitid) robust cluster(unitid)
                      
                      areg lnprescomp cenlagrank privateXrank private age yearsprior termlength lnendxstud lnresearch satavg lnFTsal lnFTE lngiftsxstud lnendxstud sex lac priorpres clergy final system year0* i.numpres_name, absorb(unitid) robust cluster(unitid)
                      
                      areg lnprescomp lagrank private age yearsprior termlength lnendxstud lnresearch satavg lnFTsal lnFTE lngiftsxstud lnendxstud sex lac priorpres clergy final system year0* i.unitid, absorb(numpres_name) robust cluster(unitid).
                      And, having now looked closely at your variable names, I note a surprising number of embedded "x" characters. Please don't tell us you are manually creating interactions rather than relying on Stata's factor variable notation. You will find factor variable notation a powerful tool in your work. Do read help fvvarlist and the manual chapter linked therein. Your effort will be amply repaid. (See also help tsvarlist to learn how to simiarly avoid creating lagged variables.)
                      Last edited by William Lisowski; 24 Dec 2016, 18:55.

                      Comment


                      • #12
                        Originally posted by William Lisowski View Post
                        The two results that you feel should be the same differ perhaps because you have chosen to fit your fixed effects model using areg, and Stata's documentation tells us clearly that your data do not meet the assumptions underlying the methodology, and recommends xtreg, fe for data that does not meet the areg assumptions.

                        To answer the question posed in your final sentence at #10, you have not demonstrated that the results differ when you estimate the model with an appropriate methodology. Nowhere have you explained why between posts #4 and #6 why you abandoned xtreg for areg.

                        Or maybe there's another answer. I copied and pasted your three commands from #6 into a CODE block (as described in the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post, in sections 9-12 on how to best pose your question.) My intent was to nudge you toward following those guidelines by showing how much easier it is to review code presented this way. But - now that it's readable - it leaps out that what you label as your third command in #6 is substantively different from your second command.
                        Code:
                        areg lnprescomp cenlagrank privateXrank private age yearsprior termlength lnendxstud lnresearch satavg lnFTsal lnFTE lngiftsxstud lnendxstud sex lac priorpres clergy final system year0*, absorb(unitid) robust cluster(unitid)
                        
                        areg lnprescomp cenlagrank privateXrank private age yearsprior termlength lnendxstud lnresearch satavg lnFTsal lnFTE lngiftsxstud lnendxstud sex lac priorpres clergy final system year0* i.numpres_name, absorb(unitid) robust cluster(unitid)
                        
                        areg lnprescomp lagrank private age yearsprior termlength lnendxstud lnresearch satavg lnFTsal lnFTE lngiftsxstud lnendxstud sex lac priorpres clergy final system year0* i.unitid, absorb(numpres_name) robust cluster(unitid).
                        And, having now looked closely at your variable names, I note a surprising number of embedded "x" characters. Please don't tell us you are manually creating interactions rather than relying on Stata's factor variable notation. You will find factor variable notation a powerful tool in your work. Do read help fvvarlist and the manual chapter linked therein. Your effort will be amply repaid.
                        I switched to areg because my coauthor prefers it, but considers them interchangeable.

                        The line of command for the third model is wrong. The only change from the second model is that the absorb command has changed from unitid to numpres_name and I am now using unitid dummies instead of numpres_name dummies.

                        As you can see the coefficients are the same in most of the variables. I don't understand why they are not the same for all of the variables.

                        Comment


                        • #13
                          Perhaps it's time to confirm they're interchangeable by fitting models 2 and 3 with xtreg, fe.

                          Comment


                          • #14
                            Originally posted by Philip Gigliotti View Post
                            The problem comes in when I run the second and third models. The second model is a fixed effect model with the absorb variable set to unitid, and dummies for numpres_name. The third is a fixed effects model with the absorb variable set to numpres_name and dummies for unitid. Though these models should be equivalent, and are for most variables, some of the variables diverge and I don't know why. In the third model, time invariant characteristics for both the unitid variable and the numpres_name variable drop out, but in the second, only the unitid time invariant variables drop out, and the numpres_name variables have strange coefficients (eg. sex). the age and yearsprior variables also have clashing signs, coefficient sizes or standard errors.
                            Isn't this because once a couple of colleges get new presidents with different sex or clerical status during the interval, then sex, clerical status become no longer time-invariant for colleges and remain in that fixed-effects model? The converse is not true, and so you will have those coefficients in the second model and not in the third. The presence of those additional explanatory variables in the second model / absence in the third naturally affect at least some of the other regression coefficients. Are you saying that there's something going on here beyond that?

                            Comment


                            • #15
                              the commands I used are

                              Code:
                              areg lnprescomp cenlagrank privateXrank private  age yearsprior termlength lnresearch satavg lnFTsal lnFTE lngiftsxstud lnendxstud sex lac priorpres clergy final system year0* , absorb(unitid) robust cluster(unitid)
                              
                              areg lnprescomp cenlagrank privateXrank private  age yearsprior termlength lnresearch satavg lnFTsal lnFTE lngiftsxstud lnendxstud sex lac priorpres clergy final system year0* i.numpres_name, absorb(unitid) robust cluster(unitid)
                              
                              areg lnprescomp cenlagrank privateXrank private  age yearsprior termlength lnresearch satavg lnFTsal lnFTE lngiftsxstud lnendxstud sex lac priorpres clergy final system year0* i.unitid, absorb(numpres_name) robust cluster(unitid)
                              They produce the following results.
                              (1) (2) (3)
                              VARIABLES lnprescomp lnprescomp lnprescomp
                              cenlagrank -0.0124*** -0.0104** -0.0104**
                              (0.00467) (0.00449) (0.00449)
                              privateXrank 0.0138** 0.0134* 0.0134*
                              (0.00686) (0.00727) (0.00727)
                              o.private - -
                              age -0.000346 -0.0666*** 0.0675***
                              (0.00504) (0.00913) (0.0230)
                              yearsprior -0.00434 0.392*** 0.0459***
                              (0.0124) (0.105) (0.0172)
                              termlength 0.0145** 0.0200 0.0200
                              (0.00560) (0.0184) (0.0184)
                              lnresearch 0.0258 0.0183 0.0183
                              (0.0254) (0.0263) (0.0263)
                              satavg 0.000418 -0.000167 -0.000167
                              (0.000768) (0.000738) (0.000738)
                              lnFTsal 0.595** 0.506 0.506
                              (0.236) (0.310) (0.310)
                              lnFTE -0.0729 -0.605 -0.605
                              (0.316) (0.422) (0.422)
                              lngiftsxstud -0.00606 -0.0197 -0.0197
                              (0.0305) (0.0407) (0.0407)
                              lnendxstud 0.00913 -0.0596 -0.0596
                              (0.0769) (0.0930) (0.0930)
                              sex 0.0964 3.795***
                              (0.0723) (1.055)
                              o.lac - - -
                              priorpres 0.200* 0.0607 0.0607
                              (0.102) (0.0399) (0.0399)
                              clergy -0.0650 0.623***
                              (0.356) (0.181)
                              final 0.0534 0.0832 0.0832
                              (0.0495) (0.0586) (0.0586)
                              o.system - - -
                              Constant 6.129* 16.35*** 8.834***
                              (3.466) (4.285) (3.285)
                              Observations 863 863 863
                              R-squared 0.815 0.881 0.881
                              Robust standard errors in parentheses
                              *** p<0.01, ** p<0.05, * p<0.1
                              Using xtreg produces the following results:

                              Code:
                              xtset unitid year
                              xtreg lnprescomp cenlagrank privateXrank private  age yearsprior termlength lnresearch satavg lnFTsal lnFTE lngiftsxstud lnendxstud sex lac priorpres clergy final system year0*,fe robust cluster(unitid)
                              
                              xtreg lnprescomp cenlagrank privateXrank private  age yearsprior termlength lnresearch satavg lnFTsal lnFTE lngiftsxstud lnendxstud sex lac priorpres clergy final system year0* i.numpres_name,fe robust cluster(unitid)
                              
                              xtset numpres_name year
                              xtreg lnprescomp cenlagrank privateXrank private  age yearsprior termlength lnresearch satavg lnFTsal lnFTE lngiftsxstud lnendxstud sex lac priorpres clergy final system year0* i.unitid,fe robust cluster(numpres_name)
                              (1) (2) (3)
                              VARIABLES lnprescomp lnprescomp lnprescomp
                              cenlagrank -0.0124*** -0.0104*** -0.0104***
                              (0.00419) (0.00398) (0.00373)
                              privateXrank 0.0138** 0.0134** 0.0134**
                              (0.00616) (0.00644) (0.00610)
                              o.private - -
                              age -0.000346 -0.0666*** 0.0835***
                              (0.00453) (0.00810) (0.0318)
                              yearsprior -0.00434 0.392*** 0.150**
                              (0.0111) (0.0934) (0.0676)
                              termlength 0.0145*** 0.0200 0.0200
                              (0.00504) (0.0163) (0.0156)
                              lnresearch 0.0258 0.0183 0.0183
                              (0.0228) (0.0233) (0.0220)
                              satavg 0.000418 -0.000167 -0.000167
                              (0.000690) (0.000654) (0.000617)
                              lnFTsal 0.595*** 0.506* 0.506*
                              (0.212) (0.275) (0.260)
                              lnFTE -0.0729 -0.605 -0.605*
                              (0.284) (0.374) (0.354)
                              lngiftsxstud -0.00606 -0.0197 -0.0197
                              (0.0274) (0.0361) (0.0340)
                              lnendxstud 0.00913 -0.0596 -0.0596
                              (0.0691) (0.0824) (0.0791)
                              sex 0.0964 3.795***
                              (0.0650) (0.935)
                              o.lac - - -
                              priorpres 0.200** 0.0607* 0.0607*
                              (0.0921) (0.0353) (0.0332)
                              clergy -0.0650 0.623***
                              (0.320) (0.160)
                              final 0.0534 0.0832 0.0832*
                              (0.0445) (0.0519) (0.0488)
                              o.system - -
                              Constant 6.129* 16.35*** 7.908**
                              (3.114) (3.799) (3.856)
                              Observations 863 863 863
                              R-squared 0.414 0.624 0.509
                              Number of unitid 163 163
                              Number of numpres_name 245
                              Robust standard errors in parentheses
                              *** p<0.01, ** p<0.05, * p<0.1
                              The results are similar, though the standard errors are smaller in the xtreg model. I think this is why my coauthor believes areg is more rigorous. As you can see the the same pattern persists in both. the coefficients are the same in the second and third model for most variables except for age, yearsprior, sex and clergy, with the model in which the panel variable is set to numpres_name appearing more accurate.

                              Comment

                              Working...
                              X