Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Logit regression with panel data, low Wald statistic

    Hey all,

    I am fitting a logit model with panel data (xtlogit command), random effects and clustered standard errors. I found a solid specification with some significant and a high Wald chi2 statistic (<2000). But if I change certain variables slightly (e.g. log of a variable or a dummy instead of a categorical) or include other variables, suddenly my model changes to have only highly insignificant estimates (z value lower than 0.1 and a really low Wald Chi2 statistic with Prob > chi2 =1.0000. I cannot figure out why it changes so much, but it happens with quite a lot specification.
    Some background information which might be relevant: Number of obs = 33,956, Number of groups = 6,752, Obs per group:min = 1, avg = 5.0 max = 16, Integration pts. = 12.

    Thanks for your help!

    Best, Fabian

  • #2
    Fabian:
    welcome to the list.
    A quick scan of the FAQ conveys the idea that posting what you typed and what Stata gave you back (or an xample/excerpt of your dataset via -dataex-, that yiu can install by typing -search dataex- from within Stata) worths much more than describing what's going on.
    For instance, we cannot say whether logging a given variable with a lot of zeros in its raw metric has plagued your dataset with missing values which, in turn, alerted Stata to rule out all the observations with at least one missing value in any of your variables.
    Unfortunately, the background information you provided are not useful to reply positively.
    Kind regards,
    Carlo
    (StataNow 18.5)

    Comment


    • #3
      Thank you Carlo for your answer. So one example is this: The only thing I change is to use another dummy for education. Educcollx is a dummy if at least one household member has a college degree, educhheeqx is if at least one person in the household graduated from Gymnasium (like A-levels).
      I included both regression outputs, the codebook output for both education dummies and the dataex output for all variables included (attached)

      Attached Files

      Comment


      • #4
        Fabian:
        thanks for providing further clarifications.
        However, for the future please post directly via CODE delimiters within the body of your message what you typed and what Stata gave you back. Opening attachments coming from unknown sources is always risky.
        That said, looking at your Stata session, I would be interested to see whether those poorly informative results are replicated once that the -cluster()- option has been removed.
        Kind regards,
        Carlo
        (StataNow 18.5)

        Comment


        • #5
          it doesn't. Maybe it is because the time-invarient variables (year in which the house was built and type of house), because they have a within variation of 0? But is there an alternative to clustering? It shoots up the error term but otherwise the inference is likely to be wrong. Right now I cluster on the household level (which is also the group level).

          PS: sorry for the attachment!

          Comment


          • #6
            Fabian:
            if you g -xtlogit-, clustering teh standard errors is not mandatory (whereas it would be with -logit- applied to a panel dataset, althoug I would not sponsor this approach).
            As a general opinion, I was also thinking about a multicollinearity issue in your first model (Chi2 ststistical significant but most regressors far from being barely significant).
            Eventually, I would forget about your second model (in its current specification, at least), as Chi2 value does not allow any evidence of a joint lack of statistical significance of all the coefficients (but the constant).
            Kind regards,
            Carlo
            (StataNow 18.5)

            Comment


            • #7
              Thank you Carlo, that was very helpful! Could you explain me (or direct me to a paper) why with the xtlogit clustering standard errors is not mandatory? Would you just not do the clustering or is there a certain test I should conduct?

              Best, Fabian

              Comment


              • #8
                Fabian:
                As you can see in the following example, clustering standard errors in -xtlogit- makes no remarkable difference:
                Code:
                . use http://www.stata-press.com/data/r14/union
                (NLS Women 14-24 in 1968)
                
                . xtlogit union age grade not_smsa south##c.year
                
                Fitting comparison model:
                
                Iteration 0:   log likelihood =  -13864.23
                Iteration 1:   log likelihood = -13547.326
                Iteration 2:   log likelihood = -13542.493
                Iteration 3:   log likelihood =  -13542.49
                Iteration 4:   log likelihood =  -13542.49
                
                Fitting full model:
                
                tau =  0.0     log likelihood =  -13542.49
                tau =  0.1     log likelihood = -12923.751
                tau =  0.2     log likelihood = -12417.651
                tau =  0.3     log likelihood = -12001.665
                tau =  0.4     log likelihood = -11655.586
                tau =  0.5     log likelihood = -11366.441
                tau =  0.6     log likelihood = -11128.749
                tau =  0.7     log likelihood = -10946.399
                tau =  0.8     log likelihood = -10844.833
                
                Iteration 0:   log likelihood = -10946.488
                Iteration 1:   log likelihood =  -10557.39
                Iteration 2:   log likelihood = -10540.493
                Iteration 3:   log likelihood = -10540.274
                Iteration 4:   log likelihood = -10540.274  (backed up)
                Iteration 5:   log likelihood = -10540.274
                
                Random-effects logistic regression              Number of obs     =     26,200
                Group variable: idcode                          Number of groups  =      4,434
                
                Random effects u_i ~ Gaussian                   Obs per group:
                                                                              min =          1
                                                                              avg =        5.9
                                                                              max =         12
                
                Integration method: mvaghermite                 Integration pts.  =         12
                
                                                                Wald chi2(6)      =     227.46
                Log likelihood  = -10540.274                    Prob > chi2       =     0.0000
                
                ------------------------------------------------------------------------------
                       union |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                -------------+----------------------------------------------------------------
                         age |   .0156732   .0149895     1.05   0.296    -.0137056     .045052
                       grade |   .0870851   .0176476     4.93   0.000     .0524965    .1216738
                    not_smsa |  -.2511884   .0823508    -3.05   0.002    -.4125929   -.0897839
                     1.south |  -2.839112   .6413116    -4.43   0.000    -4.096059   -1.582164
                        year |  -.0068604   .0156575    -0.44   0.661    -.0375486    .0238277
                             |
                south#c.year |
                          1  |   .0238506   .0079732     2.99   0.003     .0082235    .0394777
                             |
                       _cons |  -3.009365   .8414963    -3.58   0.000    -4.658667   -1.360062
                -------------+----------------------------------------------------------------
                    /lnsig2u |   1.749366   .0470017                      1.657245    1.841488
                -------------+----------------------------------------------------------------
                     sigma_u |   2.398116   .0563577                      2.290162    2.511158
                         rho |   .6361098   .0108797                      .6145307    .6571548
                ------------------------------------------------------------------------------
                LR test of rho=0: chibar2(01) = 6004.43                Prob >= chibar2 = 0.000
                
                . xtlogit union age grade not_smsa south##c.year, vce(cluster clusterid)
                variable clusterid not found
                r(111);
                
                . xtlogit union age grade not_smsa south##c.year, vce(cluster idcode)
                
                Fitting comparison model:
                
                Iteration 0:   log pseudolikelihood =  -13864.23
                Iteration 1:   log pseudolikelihood = -13547.326
                Iteration 2:   log pseudolikelihood = -13542.493
                Iteration 3:   log pseudolikelihood =  -13542.49
                Iteration 4:   log pseudolikelihood =  -13542.49
                
                Fitting full model:
                
                tau =  0.0     log pseudolikelihood =  -13542.49
                tau =  0.1     log pseudolikelihood = -12923.751
                tau =  0.2     log pseudolikelihood = -12417.651
                tau =  0.3     log pseudolikelihood = -12001.665
                tau =  0.4     log pseudolikelihood = -11655.586
                tau =  0.5     log pseudolikelihood = -11366.441
                tau =  0.6     log pseudolikelihood = -11128.749
                tau =  0.7     log pseudolikelihood = -10946.399
                tau =  0.8     log pseudolikelihood = -10844.833
                
                Iteration 0:   log pseudolikelihood = -10946.488
                Iteration 1:   log pseudolikelihood =  -10557.39
                Iteration 2:   log pseudolikelihood = -10540.493
                Iteration 3:   log pseudolikelihood = -10540.274
                Iteration 4:   log pseudolikelihood = -10540.274  (backed up)
                Iteration 5:   log pseudolikelihood = -10540.274
                
                Calculating robust standard errors:
                
                Random-effects logistic regression              Number of obs     =     26,200
                Group variable: idcode                          Number of groups  =      4,434
                
                Random effects u_i ~ Gaussian                   Obs per group:
                                                                              min =          1
                                                                              avg =        5.9
                                                                              max =         12
                
                Integration method: mvaghermite                 Integration pts.  =         12
                
                                                                Wald chi2(6)      =     158.35
                Log pseudolikelihood  = -10540.274              Prob > chi2       =     0.0000
                
                                             (Std. Err. adjusted for 4,434 clusters in idcode)
                ------------------------------------------------------------------------------
                             |               Robust
                       union |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                -------------+----------------------------------------------------------------
                         age |   .0156732   .0147256     1.06   0.287    -.0131884    .0445347
                       grade |   .0870851   .0205508     4.24   0.000     .0468064    .1273639
                    not_smsa |  -.2511884   .1010817    -2.49   0.013    -.4493048    -.053072
                     1.south |  -2.839112   .8820419    -3.22   0.001    -4.567882   -1.110342
                        year |  -.0068604   .0160632    -0.43   0.669    -.0383437    .0246229
                             |
                south#c.year |
                          1  |   .0238506   .0110167     2.16   0.030     .0022582    .0454429
                             |
                       _cons |  -3.009365   .9315743    -3.23   0.001    -4.835217   -1.183513
                -------------+----------------------------------------------------------------
                    /lnsig2u |   1.749366   .0495393                      1.652271    1.846462
                -------------+----------------------------------------------------------------
                     sigma_u |   2.398116   .0594005                      2.284474    2.517411
                         rho |   .6361098   .0114671                      .6133519    .6582745
                ------------------------------------------------------------------------------
                
                .
                Clustering or not with -xt- command, is more an art than a science.
                In general, if you suspect heteroskedasticity and/or serial correlation, it worths clustering (provided that you have a quite high number of clusters, otherwise clustering standard errors can hamper more than help your estimates).
                Last edited by Carlo Lazzaro; 26 Jun 2017, 11:37.
                Kind regards,
                Carlo
                (StataNow 18.5)

                Comment


                • #9
                  But isn't it an empirical question if it makes a difference. In your data set it looks like it makes almost no difference. But in mine, it does.

                  Code:
                  xtlogit  $dep flatsize  i.buildyearaltalt i.housetyp   energyprice meantemp $s
                  > ozecon  $behav  $cond, re vce(cluster hid)
                  
                  Fitting comparison model:
                  
                  Iteration 0:   log pseudolikelihood = -18453.755  
                  Iteration 1:   log pseudolikelihood =  -17496.43  
                  Iteration 2:   log pseudolikelihood = -17407.109  
                  Iteration 3:   log pseudolikelihood = -17405.787  
                  Iteration 4:   log pseudolikelihood = -17405.785  
                  
                  Fitting full model:
                  
                  tau =  0.0     log pseudolikelihood = -17405.785
                  tau =  0.1     log pseudolikelihood = -15738.527
                  tau =  0.2     log pseudolikelihood = -14463.903
                  tau =  0.3     log pseudolikelihood = -13449.263
                  tau =  0.4     log pseudolikelihood = -12603.288
                  tau =  0.5     log pseudolikelihood = -11868.753
                  tau =  0.6     log pseudolikelihood = -11209.249
                  tau =  0.7     log pseudolikelihood = -10601.628
                  tau =  0.8     log pseudolikelihood = -10031.562
                  
                  Iteration 0:   log pseudolikelihood = -10599.486  
                  Iteration 1:   log pseudolikelihood = -8052.6275  (not concave)
                  Iteration 2:   log pseudolikelihood = -7887.4828  (not concave)
                  Iteration 3:   log pseudolikelihood = -7494.4938  
                  Iteration 4:   log pseudolikelihood = -6816.9398  (not concave)
                  Iteration 5:   log pseudolikelihood = -6655.7767  (not concave)
                  Iteration 6:   log pseudolikelihood = -6582.7672  (not concave)
                  Iteration 7:   log pseudolikelihood = -6557.2111  
                  Iteration 8:   log pseudolikelihood = -6394.7257  
                  Iteration 9:   log pseudolikelihood = -6390.3282  
                  Iteration 10:  log pseudolikelihood = -6252.0829  
                  Iteration 11:  log pseudolikelihood = -6218.6418  
                  Iteration 12:  log pseudolikelihood = -6213.8948  
                  Iteration 13:  log pseudolikelihood = -6211.4145  (backed up)
                  Iteration 14:  log pseudolikelihood = -6210.8629  (backed up)
                  Iteration 15:  log pseudolikelihood = -6210.8629  (backed up)
                  Iteration 16:  log pseudolikelihood = -6198.0868  
                  Iteration 17:  log pseudolikelihood =  -6196.793  (not concave)
                  Iteration 18:  log pseudolikelihood =  -6195.073  (not concave)
                  Iteration 19:  log pseudolikelihood = -6194.5725  
                  Iteration 20:  log pseudolikelihood = -6193.7757  
                  Iteration 21:  log pseudolikelihood = -6193.5418  
                  Iteration 22:  log pseudolikelihood = -6193.5409  
                  Iteration 23:  log pseudolikelihood = -6193.5409  
                  
                  Calculating robust standard errors:
                  
                  Random-effects logistic regression              Number of obs      =     37391
                  Group variable: hid                             Number of groups   =      5166
                  
                  Random effects u_i ~ Gaussian                   Obs per group: min =         1
                                                                                 avg =       7.2
                                                                                 max =        16
                  
                  Integration method: mvaghermite                 Integration points =        12
                  
                                                                  Wald chi2(16)      =   3983.22
                  Log pseudolikelihood  = -6193.5409              Prob > chi2        =    0.0000
                  
                                                    (Std. Err. adjusted for 5166 clusters in hid)
                  -------------------------------------------------------------------------------
                                |               Robust
                    treatwindow |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                  --------------+----------------------------------------------------------------
                       flatsize |   .0030393   .3173313     0.01   0.992    -.6189187    .6249973
                                |
                  buildyea~talt |
                             2  |  -1.170051   45.73516    -0.03   0.980    -90.80931    88.46921
                             3  |  -1.261983   34.59401    -0.04   0.971      -69.065    66.54103
                             4  |  -1.622334   41.17418    -0.04   0.969    -82.32224    79.07757
                             5  |  -2.237648   51.86878    -0.04   0.966    -103.8986    99.42329
                             6  |  -5.061566   57.82798    -0.09   0.930    -118.4023    108.2792
                                |
                       housetyp |
                  (semi-)det..  |  -3.379166   72.75161    -0.05   0.963    -145.9697    139.2114
                      terraced  |  -2.555837    57.0309    -0.04   0.964    -114.3343    109.2227
                      Wohnhaus  |  -4.453272   79.76402    -0.06   0.955    -160.7879    151.8813
                                |
                    energyprice |   1.667876   10.00735     0.17   0.868    -17.94617    21.28193
                       meantemp |  -.0860204   1.547433    -0.06   0.956    -3.118934    2.946893
                        lnhhinc |   1.093012   20.67388     0.05   0.958    -39.42705    41.61308
                           agex |   .3498934   2.989446     0.12   0.907    -5.509313      6.2091
                      riskwillb |   .0655714   3.147688     0.02   0.983    -6.103784    6.234927
                      educcollx |   2.420058   46.53325     0.05   0.959    -88.78345    93.62356
                         envirb |    .057846   14.23615     0.00   0.997    -27.84449    27.96018
                          _cons |  -64.41145   483.6968    -0.13   0.894     -1012.44    883.6169
                  --------------+----------------------------------------------------------------
                       /lnsig2u |   6.650552          .                             .           .
                  --------------+----------------------------------------------------------------
                        sigma_u |   27.80668          .                             .           .
                            rho |   .9957632          .                             .           .
                  -------------------------------------------------------------------------------
                  
                  
                  . xtlogit  $dep flatsize  i.buildyearaltalt i.housetyp   energyprice meantemp $sozecon  $behav  $cond, re
                  
                  Fitting comparison model:
                  
                  Iteration 0:   log likelihood = -18453.755  
                  Iteration 1:   log likelihood =  -17496.43  
                  Iteration 2:   log likelihood = -17407.109  
                  Iteration 3:   log likelihood = -17405.787  
                  Iteration 4:   log likelihood = -17405.785  
                  
                  Fitting full model:
                  
                  tau =  0.0     log likelihood = -17405.785
                  tau =  0.1     log likelihood = -15738.527
                  tau =  0.2     log likelihood = -14463.903
                  tau =  0.3     log likelihood = -13449.263
                  tau =  0.4     log likelihood = -12603.288
                  tau =  0.5     log likelihood = -11868.753
                  tau =  0.6     log likelihood = -11209.249
                  tau =  0.7     log likelihood = -10601.628
                  tau =  0.8     log likelihood = -10031.562
                  
                  Iteration 0:   log likelihood = -10599.486  
                  Iteration 1:   log likelihood = -8052.6275  (not concave)
                  Iteration 2:   log likelihood = -7887.4828  (not concave)
                  Iteration 3:   log likelihood = -7494.4938  
                  Iteration 4:   log likelihood = -6816.9398  (not concave)
                  Iteration 5:   log likelihood = -6655.7767  (not concave)
                  Iteration 6:   log likelihood = -6582.7672  (not concave)
                  Iteration 7:   log likelihood = -6557.2111  
                  Iteration 8:   log likelihood = -6394.7257  
                  Iteration 9:   log likelihood = -6390.3282  
                  Iteration 10:  log likelihood = -6252.0829  
                  Iteration 11:  log likelihood = -6218.6418  
                  Iteration 12:  log likelihood = -6213.8948  
                  Iteration 13:  log likelihood = -6211.4145  (backed up)
                  Iteration 14:  log likelihood = -6210.8629  (backed up)
                  Iteration 15:  log likelihood = -6210.8629  (backed up)
                  Iteration 16:  log likelihood = -6198.0868  
                  Iteration 17:  log likelihood =  -6196.793  (not concave)
                  Iteration 18:  log likelihood =  -6195.073  (not concave)
                  Iteration 19:  log likelihood = -6194.5725  
                  Iteration 20:  log likelihood = -6193.7757  
                  Iteration 21:  log likelihood = -6193.5418  
                  Iteration 22:  log likelihood = -6193.5409  
                  Iteration 23:  log likelihood = -6193.5409  
                  
                  Random-effects logistic regression              Number of obs      =     37391
                  Group variable: hid                             Number of groups   =      5166
                  
                  Random effects u_i ~ Gaussian                   Obs per group: min =         1
                                                                                 avg =       7.2
                                                                                 max =        16
                  
                  Integration method: mvaghermite                 Integration points =        12
                  
                                                                  Wald chi2(16)      =   3983.22
                  Log likelihood  = -6193.5409                    Prob > chi2        =    0.0000
                  
                  -----------------------------------------------------------------------------------
                        treatwindow |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                  ------------------+----------------------------------------------------------------
                           flatsize |   .0030393   .0021226     1.43   0.152    -.0011209    .0071994
                                    |
                    buildyearaltalt |
                                 2  |  -1.170051   .4714359    -2.48   0.013    -2.094048   -.2460533
                                 3  |  -1.261983   .4252535    -2.97   0.003    -2.095465   -.4285018
                                 4  |  -1.622334   .4570081    -3.55   0.000    -2.518054   -.7266147
                                 5  |  -2.237648   .5255695    -4.26   0.000    -3.267745    -1.20755
                                 6  |  -5.061566   .8260435    -6.13   0.000    -6.680582    -3.44255
                                    |
                           housetyp |
                  (semi-)detached   |  -3.379166   .7275102    -4.64   0.000     -4.80506   -1.953272
                          terraced  |  -2.555837   .7709801    -3.32   0.001     -4.06693   -1.044744
                          Wohnhaus  |  -4.453272   .7940447    -5.61   0.000    -6.009571   -2.896973
                                    |
                        energyprice |   1.667876   .0317378    52.55   0.000     1.605671    1.730081
                           meantemp |  -.0860204   .0173085    -4.97   0.000    -.1199444   -.0520965
                            lnhhinc |   1.093012    .164989     6.62   0.000     .7696395    1.416384
                               agex |   .3498934   .0115931    30.18   0.000     .3271714    .3726155
                          riskwillb |   .0655714   .0323419     2.03   0.043     .0021824    .1289603
                          educcollx |   2.420058   .2762425     8.76   0.000     1.878633    2.961483
                             envirb |    .057846   .1017769     0.57   0.570     -.141633    .2573249
                              _cons |  -64.41145   1.683365   -38.26   0.000    -67.71079   -61.11212
                  ------------------+----------------------------------------------------------------
                           /lnsig2u |   6.650552   .0361109                      6.579776    6.721328
                  ------------------+----------------------------------------------------------------
                            sigma_u |   27.80668   .5020618                      26.83986    28.80832
                                rho |   .9957632   .0001523                      .9954539    .9960516
                  -----------------------------------------------------------------------------------
                  Likelihood-ratio test of rho=0: chibar2(01) =  2.2e+04 Prob >= chibar2 = 0.000
                  I also test for autocorrelation.

                  Code:
                   xtserial $dep flatsize  buildyearaltalt housetyp   energyprice meantemp $sozecon  $behav  $cond
                  Wooldridge test for autocorrelation in panel data
                  H0: no first-order autocorrelation
                      F(  1,    3985) = 489099.666
                             Prob > F =      0.0000

                  I feel like there is something structurally wrong with either my model or the way the standard errors are calculated. It is not about certain specification, but literally almost always when I change one variable or even drop a few observations. Any clues?
                  Last edited by Fabian Knodler-Thoma; 27 Jun 2017, 01:40.

                  Comment


                  • #10
                    Fabian:
                    I would stay -robust()-, but I would first investigate whether individual effects has any variance in your dataset.
                    Kind regards,
                    Carlo
                    (StataNow 18.5)

                    Comment

                    Working...
                    X