Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • xtlogit,re works but not xtlogit,fe

    Hi,

    I have an unbalanced panel dataset (N=2976, T=13), using survey responses.
    My dependent variable is the household's ability to save (saving=1 if able to save, 0 otherwise).
    hhid is the Household's unique identifier, and the data is yearly.

    When I run my model with -xtlogit, re-, there is no issue. But with -xtlogit, fe-, I get an error message.
    Please could you help me to understand why this has occurred?

    Code:
    . xtlogit saving $xlist $controllist i.year, re nolog
    
    Random-effects logistic regression              Number of obs     =      5,241
    Group variable: hhid                            Number of groups  =      1,718
    
    Random effects u_i ~ Gaussian                   Obs per group:
                                                                  min =          1
                                                                  avg =        3.1
                                                                  max =         13
    
    Integration method: mvaghermite                 Integration pts.  =         12
    
                                                    Wald chi2(31)     =     436.22
    Log likelihood  = -2756.1286                    Prob > chi2       =     0.0000
    
    ------------------------------------------------------------------------------
          saving |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
      precaution |   .0287749   .0184366     1.56   0.119    -.0073602    .0649101
        purchase |  -.0110621   .0180577    -0.61   0.540    -.0464545    .0243303
      retirement |   .0040648   .0145647     0.28   0.780    -.0244815    .0326111
         bequest |   .0079644   .0120747     0.66   0.510    -.0157015    .0316303
         mediumh |   .5228101   .0969268     5.39   0.000     .3328372    .7127831
           longh |   .3424441   .2626378     1.30   0.192    -.1723164    .8572047
             age |  -.0588166   .0318057    -1.85   0.064    -.1211546    .0035214
                 |
     c.age#c.age |   .0003984   .0003009     1.32   0.186    -.0001914    .0009882
                 |
            male |   .6713149   .1930465     3.48   0.001     .2929507    1.049679
         partner |  -.2353085   .1627531    -1.45   0.148    -.5542988    .0836818
        children |  -.4159175   .0725285    -5.73   0.000    -.5580707   -.2737643
           house |   .6101744   .1524424     4.00   0.000     .3113929     .908956
             uni |   .5110307   .1732816     2.95   0.003     .1714051    .8506564
        employed |   .5710195   .2131565     2.68   0.007     .1532405    .9887986
         retired |   .3504025    .223539     1.57   0.117    -.0877259     .788531
          health |   .2294954   .0820737     2.80   0.005     .0686339     .390357
        incomesc |   .0122304    .002279     5.37   0.000     .0077637    .0166971
            risk |  -.0075499   .0089772    -0.84   0.400    -.0251449     .010045
     selfcontrol |   .6432303   .0407502    15.78   0.000     .5633615    .7230991
                 |
            year |
           2005  |  -.4750785   .1850209    -2.57   0.010    -.8377129   -.1124441
           2006  |  -.5867784   .1904732    -3.08   0.002     -.960099   -.2134578
           2007  |  -.4759589   .1932319    -2.46   0.014    -.8546864   -.0972314
           2008  |  -.4051887   .1852716    -2.19   0.029    -.7683144   -.0420631
           2009  |   -.318116   .1857973    -1.71   0.087    -.6822721    .0460401
           2010  |  -.4950002   .1872916    -2.64   0.008     -.862085   -.1279154
           2011  |  -.5553593    .233515    -2.38   0.017     -1.01304   -.0976782
           2012  |  -.3058978   .2329488    -1.31   0.189    -.7624691    .1506735
           2013  |  -.5059553   .2397291    -2.11   0.035    -.9758156    -.036095
           2014  |  -.3837295    .227862    -1.68   0.092    -.8303308    .0628719
           2015  |  -.1927174   .2363871    -0.82   0.415    -.6560277    .2705929
           2016  |  -.4629798   .2347038    -1.97   0.049    -.9229908   -.0029687
                 |
           _cons |  -4.710044   1.000298    -4.71   0.000    -6.670592   -2.749497
    -------------+----------------------------------------------------------------
        /lnsig2u |   1.148462   .1045686                      .9435112    1.353413
    -------------+----------------------------------------------------------------
         sigma_u |   1.775764   .0928446                      1.602806    1.967387
             rho |   .4894052   .0261304                      .4384793    .5405521
    ------------------------------------------------------------------------------
    LR test of rho=0: chibar2(01) = 613.58                 Prob >= chibar2 = 0.000
    Code:
    . xtlogit saving $xlist $controllist i.year, fe nolog
    note: multiple positive outcomes within groups encountered.
    note: 1,254 groups (2,734 obs) dropped because of all positive or
          all negative outcomes.
    note: male omitted because of no within-group variance.
    Hessian is not negative semidefinite
    r(430);
    Error message:
    [P] error . . . . . . . . . . . . . . . . . . . . . . . . Return code 430
    convergence not achieved;
    You have estimated a maximum likelihood model, and Stata's
    maximization procedure failed to converge to a solution;
    see [R] maximize. Check if the model is identified.

    (end of search)
    Thanks in advance

  • #2
    note: multiple positive outcomes within groups encountered -- Sometimes there is only supposed to be one positive outcome for a case. If that isn't the case in your data this need not concern you.

    note: 1,254 groups (2,734 obs) dropped because of all positive or all negative outcomes -- again, probably not a matter of concern. Cases will drop out of a fe model if the outcome is the same across all records for the case.

    note: male omitted because of no within-group variance. -- yes, variables whose values do not change for a case will get dropped in a fe model.


    Hessian is not negative semidefinite -- OK, this is the hard one. I would suggest greatly simplifying your model and gradually add more and more variables. Then maybe you can identify the variable or variables that is causing you grief.

    -------------------------------------------
    Richard Williams, Notre Dame Dept of Sociology
    EMAIL: rwilliam@ND.Edu
    WWW: http://www3.nd.edu/~rwilliam

    Comment


    • #3
      Thank you for your reply Richard Williams

      Originally posted by Richard Williams View Post
      note: multiple positive outcomes within groups encountered -- Sometimes there is only supposed to be one positive outcome for a case. If that isn't the case in your data this need not concern you.
      What is meant by a positive outcome for a case?
      For example I have a categorical variable for health as follows. Is this what is meant by multiple outcomes?
      Code:
      -------------------------------------------------------------------------------------
      health                                                       general health condition
      -------------------------------------------------------------------------------------
      
                        type:  numeric (double)
                       label:  health
      
                       range:  [1,5]                        units:  1
               unique values:  5                        missing .:  0/13,217
      
                  tabulation:  Freq.   Numeric  Label
                                 136         1  poor
                                 600         2  not so good
                               2,402         3  fair
                               8,535         4  good
                               1,544         5  excellent
      ​​​​​​​
      note: 1,254 groups (2,734 obs) dropped because of all positive or all negative outcomes -- again, probably not a matter of concern. Cases will drop out of a fe model if the outcome is the same across all records for the case.
      For example if my y-variable saving==1 in all periods, then will these individuals drop out? Would this then be a sample bias, as it would become unrepresentative of the true population?

      ​​​​​​​
      note: male omitted because of no within-group variance. -- yes, variables whose values do not change for a case will get dropped in a fe model.
      If I would like to include -male- in my regression, as it is a control variable used widely in the literature, then would this be sufficient justification for me choosing RE over FE?

      ​​​​​​​
      Hessian is not negative semidefinite -- OK, this is the hard one. I would suggest greatly simplifying your model and gradually add more and more variables. Then maybe you can identify the variable or variables that is causing you grief.
      Code:
      . xtlogit saving $xlist male partner children house uni employed retired health incomesc risk selfcont
      > rol i.year, fe nolog
      note: multiple positive outcomes within groups encountered.
      note: 1,254 groups (2,734 obs) dropped because of all positive or
            all negative outcomes.
      note: male omitted because of no within-group variance.
      
      Conditional fixed-effects logistic regression   Number of obs     =      2,507
      Group variable: hhid                            Number of groups  =        464
      
                                                      Obs per group:
                                                                    min =          2
                                                                    avg =        5.4
                                                                    max =         13
      
                                                      LR chi2(28)       =      88.24
      Log likelihood  = -917.07717                    Prob > chi2       =     0.0000
      
      ------------------------------------------------------------------------------
            saving |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
        precaution |   .0124372   .0241284     0.52   0.606    -.0348535    .0597279
          purchase |  -.0549006   .0241896    -2.27   0.023    -.1023115   -.0074898
        retirement |   .0009473   .0203117     0.05   0.963     -.038863    .0407575
           bequest |   .0293337   .0196059     1.50   0.135    -.0090931    .0677606
           mediumh |    .152717   .1190459     1.28   0.200    -.0806086    .3860425
             longh |   .0515805   .3315808     0.16   0.876     -.598306     .701467
              male |          0  (omitted)
           partner |   -.689097   .4125599    -1.67   0.095      -1.4977    .1195056
          children |  -.3290673   .1405265    -2.34   0.019    -.6044942   -.0536404
             house |  -.5727232   .3660001    -1.56   0.118     -1.29007    .1446239
               uni |    14.9564   604.1436     0.02   0.980    -1169.143    1199.056
          employed |   .3341242   .3392271     0.98   0.325    -.3307487    .9989972
           retired |   .0532334   .3627569     0.15   0.883     -.657757    .7642239
            health |   .1730658   .1272716     1.36   0.174     -.076382    .4225136
          incomesc |   .0073066   .0029235     2.50   0.012     .0015767    .0130365
              risk |  -.0142658   .0139086    -1.03   0.305    -.0415262    .0129947
       selfcontrol |   .3281811   .0566886     5.79   0.000     .2170735    .4392886
                   |
              year |
             2005  |  -.4122874   .2069111    -1.99   0.046    -.8178256   -.0067492
             2006  |  -.4943309    .213468    -2.32   0.021    -.9127206   -.0759413
             2007  |  -.3421271   .2184738    -1.57   0.117     -.770328    .0860737
             2008  |  -.2987216   .2090865    -1.43   0.153    -.7085236    .1110804
             2009  |  -.3973442   .2120328    -1.87   0.061    -.8129208    .0182325
             2010  |  -.5398134   .2158234    -2.50   0.012    -.9628195   -.1168073
             2011  |  -.5154127   .2630706    -1.96   0.050    -1.031022    .0001963
             2012  |  -.2894888   .2738484    -1.06   0.290    -.8262218    .2472443
             2013  |     -.4511   .2791162    -1.62   0.106    -.9981577    .0959577
             2014  |  -.5859991    .280372    -2.09   0.037    -1.135518     -.03648
             2015  |  -.4781813   .2977073    -1.61   0.108    -1.061677    .1053143
             2016  |  -.4311959   .3058927    -1.41   0.159    -1.030735    .1683427
      ------------------------------------------------------------------------------
      
      . xtlogit saving $xlist c.age##c.age male partner children house uni employed retired health incomesc 
      > risk selfcontrol i.year, fe nolog
      note: multiple positive outcomes within groups encountered.
      note: 1,254 groups (2,734 obs) dropped because of all positive or
            all negative outcomes.
      note: male omitted because of no within-group variance.
      It seems as though the regression runs fine with all other variables if I don't include c.age##c.age

      ​​​​​​​Thanks

      Comment


      • #4
        note: male omitted because of no within-group variance. -- yes, variables whose values do not change for a case will get dropped in a fe model.
        If I would like to include -male- in my regression, as it is a control variable used widely in the literature, then would this be sufficient justification for me choosing RE over FE?
        Not so fast! According to the model output, the group variable is hhid, which I take to be a household identifier, am I right? So Stata is telling you that, at least after all the ineligible observations are omitted, in what is left of your sample, there are 464 households and all of those households are either all-male or all-female. That sounds very suspicious to me. Certainly some households will be all-male or all-female, but you have no households with a mix of genders? My money is on a data error here.

        multiple positive outcomes within groups encountered
        Here "outcome" refers to the outcome variable of your model, namely saving. A "positive" outcome means any non-missing value other than zero.
        That means that for some of your households, there is more than one observation for which the outcome variable, saving is non-zero. This is probably not a problem. Stata gives you that warning because in some applications of -xtlogit, fe-, the expected data structure is that in any group there is exactly one non-zero value of the outcome variable. But it doesn't sound like that is the case for your setting. I think you can safely ignore this one, as Richard suggested earlier.

        It seems as though the regression runs fine with all other variables if I don't include c.age##c.age
        OK, but then you have no representation of age in your model at all. Is that reasonable? I don't pretend to have any substantive expertise in saving habits, but I have the layperson's impression that savings behavior varies considerably over the life cycle. What happens if you replace c.age##c.age with just c.age? Then at least you have some role for age in the model, though it is a linear one, which might be inappropriate. Have you looked graphically at the relationship between saving and age in your data (-lowess saving age, logit-)? What kind of shape is it? If there's an inverted-U relationship and you are unable to model it quadratically, you might have better luck with a linear spline, or perhaps chunking age into four or five narrow age groups. (I normally recommend against making categories out of continuous variables, but if all other options fail and the relationship is clearly non-linear...)

        Comment


        • #5
          Thank you for your reply Clyde Schechter

          Originally posted by Clyde Schechter View Post
          Not so fast! According to the model output, the group variable is hhid, which I take to be a household identifier, am I right? So Stata is telling you that, at least after all the ineligible observations are omitted, in what is left of your sample, there are 464 households and all of those households are either all-male or all-female. That sounds very suspicious to me. Certainly some households will be all-male or all-female, but you have no households with a mix of genders? My money is on a data error here.
          Indeed, hhid is a household identifier. However, I only have one respondent from each household (as I have restricted my sample to only include household heads). Therefore, hhid is in a way an identifier for individuals. Now I am puzzled as I would expect all hhid's to be either all-male or all-female. Is there a way for me to check each hhid and whether their gender was constant over all time periods please? I tried the following but it is not possible:
          Code:
          . tab hhid male
          too many values
          r(134);
          Here "outcome" refers to the outcome variable of your model, namely saving. A "positive" outcome means any non-missing value other than zero.
          That means that for some of your households, there is more than one observation for which the outcome variable, saving is non-zero. This is probably not a problem. Stata gives you that warning because in some applications of -xtlogit, fe-, the expected data structure is that in any group there is exactly one non-zero value of the outcome variable. But it doesn't sound like that is the case for your setting. I think you can safely ignore this one, as Richard suggested earlier.
          Thank you for the clear explanation, I will go by what you and Richard have suggested.

          OK, but then you have no representation of age in your model at all. Is that reasonable? I don't pretend to have any substantive expertise in saving habits, but I have the layperson's impression that savings behavior varies considerably over the life cycle. What happens if you replace c.age##c.age with just c.age? Then at least you have some role for age in the model, though it is a linear one, which might be inappropriate. Have you looked graphically at the relationship between saving and age in your data (-lowess saving age, logit-)? What kind of shape is it? If there's an inverted-U relationship and you are unable to model it quadratically, you might have better luck with a linear spline, or perhaps chunking age into four or five narrow age groups. (I normally recommend against making categories out of continuous variables, but if all other options fail and the relationship is clearly non-linear...)
          I would definitely like to keep -age- in the model, as it is an important control variable, and yes as you say it plays an important role over the life cycle.
          I tried running with c.age instead of c.age##c.age, but Stata has been running this for the past half an hour or so and not returned an outcome yet - I will keep you posted on this.
          The results from lowess appears non-linear, close to an inverted-U shaped relationship it seems. As you suggested, I may have to split age into categories and see if this works better.
          Click image for larger version

Name:	lowess.png
Views:	1
Size:	14.2 KB
ID:	1384506



          The purpose of #1 was eventually to conduct a Hausman Test to check whether to use RE or FE. So far, I have been unable to proceed with this, as the FE model is not running. Perhaps a solution to the above would be to stick with RE?

          Thanks

          Comment


          • #6
            Indeed, hhid is a household identifier. However, I only have one respondent from each household (as I have restricted my sample to only include household heads). Therefore, hhid is in a way an identifier for individuals. Now I am puzzled as I would expect all hhid's to be either all-male or all-female. Is there a way for me to check each hhid and whether their gender was constant over all time periods please? I tried the following but it is not possible:
            OK, I didn't realize that you had restricted the data to one person per household (the head). In that case, you needn't check any farther: Stata has already told you that every household head is either all-male or all-female as you expect. But that is precisely the reason why the male variable is being omitted from the model. You simply cannot estimate the effect of an attribute that does not vary within group (household) using a fixed-effects model. If including male as a predictor is critical to your research goals, then you have to consider using a random effects model (or perhaps revisit your decision to just include household heads in the analysis.)

            I would definitely like to keep -age- in the model, as it is an important control variable, and yes as you say it plays an important role over the life cycle.
            I tried running with c.age instead of c.age##c.age, but Stata has been running this for the past half an hour or so and not returned an outcome yet - I will keep you posted on this.
            The results from lowess appears non-linear, close to an inverted-U shaped relationship it seems. As you suggested, I may have to split age into categories and see if this works better.
            That would be one way to go. Another thought is that it is not really very U-shaped. To my eye this looks more like it is roughly flat until about age 75 and then declines linearly thereafter. In fact, this graph, to me, explains why you are getting that message about the Hessian not being positive semi-definite. I think with that very wide flat expanse in the graph, the turning point of the inverted-U that you are trying to fit using a quadratic term is not well identified at all! It could be anywhere in that 0-75 range. I think that's the cause of that problem. I think it also explains why you are having a very long run time using just linear age: what is the slope of the best fit straight line in the graph you show? It's some value of negative, no doubt. But pinning it down could be hard because, I'm going to hazard a guess, you don't have a lot of people over age 75 in your data. Because of the near flatness below age 75, the people under age 75 don't give much information about the age slope. The age slope is determined almost entirely by the people over age 75, and, if, as I suspect, they aren't very numerous in your data, there isn't much information for Stata to go on, so it iterates for a very long time trying to figure it out. I wouldn't even be surprised if, in the end, convergence fails.

            If this were my project, I would probably use a linear spline here:

            Code:
            mkspline upto75 7t over75 = age
            and substitute -upto75 over75- in your -xtlogit- where you currently have -c.age##c.age- I think you will have better luck with that. I would expect that the model would converge without an excessive number of iterations. The confidence interval around the slope of the over75 variable will probably be very wide. But I think that will be a realistic reflection of the information in your data set.


            Comment


            • #7
              Originally posted by Clyde Schechter View Post
              OK, I didn't realize that you had restricted the data to one person per household (the head). In that case, you needn't check any farther: Stata has already told you that every household head is either all-male or all-female as you expect. But that is precisely the reason why the male variable is being omitted from the model. You simply cannot estimate the effect of an attribute that does not vary within group (household) using a fixed-effects model. If including male as a predictor is critical to your research goals, then you have to consider using a random effects model (or perhaps revisit your decision to just include household heads in the analysis.)
              Apologies, I should have explained that I dropped non-household heads. Thank you for the clarification - I now understand; as the gender of each hhid is either always-male or always-female, the FE estimator has dropped out the entire variables. Including male is widely used in the literature, although it is not the main x-variable that I am looking at. Nevertheless, I think it is an important predictor of saving behaviour (and it returned a significant coefficient for -xtlogit, re- in #1) so I will consider using RE albeit without conducting a Hausman Test.

              That would be one way to go. Another thought is that it is not really very U-shaped. To my eye this looks more like it is roughly flat until about age 75 and then declines linearly thereafter. In fact, this graph, to me, explains why you are getting that message about the Hessian not being positive semi-definite. I think with that very wide flat expanse in the graph, the turning point of the inverted-U that you are trying to fit using a quadratic term is not well identified at all! It could be anywhere in that 0-75 range. I think that's the cause of that problem. I think it also explains why you are having a very long run time using just linear age: what is the slope of the best fit straight line in the graph you show? It's some value of negative, no doubt. But pinning it down could be hard because, I'm going to hazard a guess, you don't have a lot of people over age 75 in your data. Because of the near flatness below age 75, the people under age 75 don't give much information about the age slope. The age slope is determined almost entirely by the people over age 75, and, if, as I suspect, they aren't very numerous in your data, there isn't much information for Stata to go on, so it iterates for a very long time trying to figure it out. I wouldn't even be surprised if, in the end, convergence fails.
              Sorry - how do I find the slope of the line of best fit? I looked in the Stata manual under -lowess- but could not find it.
              There seem to be less observations for age>75:
              Code:
              . sum hhid
              
                  Variable |        Obs        Mean    Std. Dev.       Min        Max
              -------------+---------------------------------------------------------
                      hhid |     13,217    33163.58    26819.34          6      89972
              
              . sum hhid if age>70
              
                  Variable |        Obs        Mean    Std. Dev.       Min        Max
              -------------+---------------------------------------------------------
                      hhid |      2,431    28395.49    27084.93         21      89820
              
              . sum hhid if age>75
              
                  Variable |        Obs        Mean    Std. Dev.       Min        Max
              -------------+---------------------------------------------------------
                      hhid |      1,281    27056.16    27535.76         38      89450
              Thank you for the suggestion, I will read up on -mkspline- to understand this more

              Comment


              • #8
                Sorry - how do I find the slope of the line of best fit? I looked in the Stata manual under -lowess- but could not find it.
                My apologies. When I asked "what is the slope of the line of best fit" I meant it as a rhetorical question, an invitation to a thought experiment. I didn't intend for you to try to calculate it. (And you won't find it lowess anyway.) My point was that gazing at the graph it's pretty unclear what the best fit line is.

                Well, certainly the over 75 crowd are a small part of the data, though, I have to say, there are more observations like that than I would have predicted. Well, that's even better than I thought: I expect that the model with the linear spline will go rather well, and you might even get a reasonably narrow confidence interval around the over75 coefficient.
                Last edited by Clyde Schechter; 19 Apr 2017, 17:51. Reason: Correct typo.

                Comment


                • #9
                  Clyde Schechter thank you very much for your reply and suggestion

                  Originally posted by Clyde Schechter View Post
                  Another thought is that it is not really very U-shaped. To my eye this looks more like it is roughly flat until about age 75 and then declines linearly thereafter. In fact, this graph, to me, explains why you are getting that message about the Hessian not being positive semi-definite. I think with that very wide flat expanse in the graph, the turning point of the inverted-U that you are trying to fit using a quadratic term is not well identified at all! It could be anywhere in that 0-75 range. I think that's the cause of that problem. I think it also explains why you are having a very long run time using just linear age: what is the slope of the best fit straight line in the graph you show? It's some value of negative, no doubt. But pinning it down could be hard because, I'm going to hazard a guess, you don't have a lot of people over age 75 in your data. Because of the near flatness below age 75, the people under age 75 don't give much information about the age slope. The age slope is determined almost entirely by the people over age 75, and, if, as I suspect, they aren't very numerous in your data, there isn't much information for Stata to go on, so it iterates for a very long time trying to figure it out. I wouldn't even be surprised if, in the end, convergence fails.

                  If this were my project, I would probably use a linear spline here:

                  Code:
                  mkspline upto75 7t over75 = age
                  and substitute -upto75 over75- in your -xtlogit- where you currently have -c.age##c.age- I think you will have better luck with that. I would expect that the model would converge without an excessive number of iterations. The confidence interval around the slope of the over75 variable will probably be very wide. But I think that will be a realistic reflection of the information in your data set.
                  I read about -mkspline- and my understanding is that it permits the slope to change significantly at age=75. Hence I have tried the suggested coding:

                  Code:
                  . mkspline upto75 75 over75 = age
                  
                  . xtlogit saving $xlist upto75 over75 male partner children house uni employed retired h
                  > ealth incomesc risk selfcontrol i.year, fe nolog
                  note: multiple positive outcomes within groups encountered.
                  note: 1,254 groups (2,734 obs) dropped because of all positive or
                        all negative outcomes.
                  note: male omitted because of no within-group variance.
                  Unfortunately the regression is still taking a while to run (it has been running for around half an hour without returning an outcome).
                  It does the same when just c.age is used instead of c.age##c.age.

                  I am unsure of why the -age- is causing issues in -xtlogit, fe-, but believe it should still be included in my regression, so I think I should opt for a RE estimator instead?

                  Many thanks

                  Comment


                  • #10
                    Well, I suppose the other problem you have with age in the -fe- estimator is that, because you are also using year indicators, and the -fe- estimator is a within-panel estimator, age is pretty much redundant with year. So Stata may be having trouble trying to allocate the effects on outcome between age and the years. And a linear spline would not help that, as it has the same limitation of being relatively uninformative.

                    You need to think clearly about your research objectives here. It is quite clear that age and sex are recognized as important predictors of saving behavior. The question is whether you need to estimate those effects yourself as part of your study, or whether you just need to make sure that your analysis appropriately adjusts for their effects. You cannot estimate the effect of sex at all in a fixed-effects model. And you can do so, at best, with great difficulty if you are also including year effects in your model. If you do not need to estimate these effects for your research goals, but just need to be sure that those effects are appropriately adjusted ("controlled") for in your analysis, then you can use a -fixed- effects model that includes year indicators and just omit both age and sex from the model. If, on the other hand, actually estimating these effects is a goal of your research, then you really have no choice but to go to a random effects model.

                    Comment


                    • #11
                      Originally posted by Clyde Schechter View Post
                      Well, I suppose the other problem you have with age in the -fe- estimator is that, because you are also using year indicators, and the -fe- estimator is a within-panel estimator, age is pretty much redundant with year. So Stata may be having trouble trying to allocate the effects on outcome between age and the years. And a linear spline would not help that, as it has the same limitation of being relatively uninformative.
                      I hadn't thought of that before - yes that's very true.

                      You need to think clearly about your research objectives here. It is quite clear that age and sex are recognized as important predictors of saving behavior. The question is whether you need to estimate those effects yourself as part of your study, or whether you just need to make sure that your analysis appropriately adjusts for their effects. You cannot estimate the effect of sex at all in a fixed-effects model. And you can do so, at best, with great difficulty if you are also including year effects in your model. If you do not need to estimate these effects for your research goals, but just need to be sure that those effects are appropriately adjusted ("controlled") for in your analysis, then you can use a -fixed- effects model that includes year indicators and just omit both age and sex from the model. If, on the other hand, actually estimating these effects is a goal of your research, then you really have no choice but to go to a random effects model.
                      As the effects of age and sex are not the key explanatory variables that I am researching, and they can still be controlled for by using a FE model with year dummies, it should then be fine to present the results of the Logit FE model.

                      Code:
                      . xtlogit saving $xlist partner children house uni employed retired health incomesc risk
                      > selfcontrol i.year, fe nolog
                      note: multiple positive outcomes within groups encountered.
                      note: 1,254 groups (2,734 obs) dropped because of all positive or
                            all negative outcomes.
                      
                      . est store fe
                      
                      . xtlogit saving $xlist partner children house uni employed retired health incomesc risk
                      > selfcontrol i.year, re nolog
                      
                      . est store re
                      
                      . hausman fe re
                      
                      Note: the rank of the differenced variance matrix (10) does not equal the number of
                              coefficients being tested (28); be sure this is what you expect, or there may be
                              problems computing the test.  Examine the output of your estimators for anything
                              unexpected and possibly consider scaling your variables so that the coefficients
                              are on a similar scale.
                      
                                       ---- Coefficients ----
                                   |      (b)          (B)            (b-B)     sqrt(diag(V_b-V_B))
                                   |       fe           re         Difference          S.E.
                      -------------+----------------------------------------------------------------
                        precaution |    .0124372     .0227559       -.0103187        .0157095
                          purchase |   -.0549006     .0028901       -.0577907        .0169368
                        retirement |    .0009473     .0005326        .0004147        .0144466
                           bequest |    .0293337      .007123        .0222107        .0155594
                           mediumh |     .152717     .5008082       -.3480912        .0698139
                             longh |    .0515805     .3162737       -.2646932        .2030103
                           partner |    -.689097    -.0037154       -.6853816        .3853863
                          children |   -.3290673    -.3854612        .0563938        .1219048
                             house |   -.5727232     .5909621       -1.163685        .3333268
                               uni |     14.9564     .5303961          14.426        604.1436
                          employed |    .3341242     .7554477       -.4213234          .26969
                           retired |    .0532334     .3553522       -.3021187        .2945091
                            health |   -.1730658    -.2425256        .0694598        .0975184
                          incomesc |    .0073066     .0121043       -.0047977        .0018246
                              risk |   -.0142658    -.0017768       -.0124889          .01073
                       selfcontrol |    .3281811     .6429633       -.3147822        .0395799
                              year |
                             2005  |   -.4122874    -.4795959        .0673086        .0934596
                             2006  |   -.4943309    -.6019538        .1076229        .0969813
                             2007  |   -.3421271    -.5100411         .167914        .1028776
                             2008  |   -.2987216    -.4409031        .1421815        .0985048
                             2009  |   -.3973442    -.3591294       -.0382148        .1041315
                             2010  |   -.5398134    -.5589375        .0191241        .1094976
                             2011  |   -.5154127    -.6051488        .0897361         .123119
                             2012  |   -.2894888     -.367262        .0777733        .1463652
                             2013  |      -.4511    -.5407178        .0896178        .1447417
                             2014  |   -.5859991    -.4232325       -.1627666        .1652321
                             2015  |   -.4781813      -.25591       -.2222713        .1831915
                             2016  |   -.4311959    -.4974499         .066254        .1984495
                      ------------------------------------------------------------------------------
                                               b = consistent under Ho and Ha; obtained from xtlogit
                                B = inconsistent under Ha, efficient under Ho; obtained from xtlogit
                      
                          Test:  Ho:  difference in coefficients not systematic
                      
                                       chi2(10) = (b-B)'[(V_b-V_B)^(-1)](b-B)
                                                =       27.72
                                      Prob>chi2 =      0.0020
                      
                      .
                      I have conducted a Hausman Test for Logit FE vs RE (not including age and sex in the models).
                      The significant p-value suggests I should proceed with FE - thank you very much for your help Clyde Schechter

                      Comment

                      Working...
                      X