Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Unusually large Wald Chi Square Values

    Hello everyone,

    I am running a Feasible GLS -xtgls- model with panel heteroskedasticity and AR(1) correlation. Additionally, in order to include fixed effects, I add in a dummy variable for each company I am examining.

    My problem is that after I run the model, I receive incredibly large Wald Chi-Square values (on the order of 50,000) and Log-likelihood of around -700 or so. Is this even possible? These are some of the highest Chi-square values I have ever seen.

    Initially I thought this was a problem of sample size (too many variables, not enough observations), but running a reduced equation gives me more or less the same results. (On a side note, does FGLS make any assumptions or considerations on sample size? I have 79 degrees of freedom and around 450 observations...)

    Any info or help on the previous questions would be really useful (reduced output included below. Not sure why, but this last model didn't give me a LL number).

    Code:
    Cross-sectional time-series FGLS regression
    
    Coefficients:  generalized least squares
    Panels:        heteroskedastic
    Correlation:   common AR(1) coefficient for all panels  (0.4707)
    
    Estimated covariances      =        71          Number of obs      =      1116
    Estimated autocorrelations =         1          Number of groups   =        71
    Estimated coefficients     =        86          Obs per group: min =         2
                                                                   avg =  15.71831
                                                                   max =        36
                                                    Wald chi2(85)      =  65657.28
                                                    Prob > chi2        =    0.0000
    
    ------------------------------------------------------------------------------
    loginvscaled |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
           loggh |  -.2849315   .0356504    -7.99   0.000    -.3548049    -.215058
           logta |   .5194881   .0240318    21.62   0.000     .4723866    .5665896
         logcogs |   .0753178   .0153522     4.91   0.000     .0452281    .1054076
                 |
          compid |
              2  |   1.951157   .1397242    13.96   0.000     1.677302    2.225011
              3  |  -.2511948    .199553    -1.26   0.208    -.6423115    .1399219
              4  |    .713452    .141207     5.05   0.000     .4366914    .9902125
              5  |   .3644441   .1644973     2.22   0.027     .0420353    .6868529
              6  |   .5342415   .1845942     2.89   0.004     .1724436    .8960395
    .......
    Thanks in advance everyone.

    Panos

  • #2
    A couple of observations.

    1. Although you say you have only 450 observations, your output says that the analysis was performed on 1116 observations. Are you sure you are running this on the right data set?

    2. Your variable logta is a very powerful predictor, with z = 21.62. That one alone will contribute a lot to the overall model chi square. (If it were the only predictor in the model, chi square would equal the square of z, already > 400.)

    3. But also, you have 70 some-odd variables for your company fixed effects in the model. The chi square statistic tests the omnibus null hypothesis that all of the coefficients of all of the model's predictor variables, including those fixed effects, are zero. If the fixed effects account for much of the variation in your outcome variable, that null hypothesis will be massively rejected.

    4. Why are you including the dummies for compid? -xtgls- will automatically absorb the fixed effects for the panel variable you specified when you -xtset- your data. Was your panel variable in -xtset- some other variable at a different level of aggregation? If not, you are, in effect, double-dipping on your fixed effects and inflating the chi square statistic. See the following simple example:

    Code:
    . webuse invest2, clear
    . xtset company time
           panel variable:  company (strongly balanced)
            time variable:  time, 1 to 20
                    delta:  1 unit .
    . xtgls invest market stock, panels(correlated) corr(ar1)
      Cross-sectional time-series FGLS regression
      Coefficients:  generalized least squares
    Panels:        heteroskedastic with cross-sectional correlation
    Correlation:   common AR(1) coefficient for all panels  (0.8651)
      Estimated covariances      =        15          Number of obs      =       100
    Estimated autocorrelations =         1          Number of groups   =         5
    Estimated coefficients     =         3          Time periods       =        20
                                                    Wald chi2(2)       =    153.66
                                                    Prob > chi2        =    0.0000
      ------------------------------------------------------------------------------
          invest |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
          market |   .0745101   .0091391     8.15   0.000     .0565978    .0924225
           stock |   .3150971   .0447361     7.04   0.000     .2274158    .4027783
           _cons |  -2.770019   13.78308    -0.20   0.841    -29.78435    24.24431
    ------------------------------------------------------------------------------
    
    . xtgls invest market stock i.company, panels(correlated) corr(ar1)
      Cross-sectional time-series FGLS regression
      Coefficients:  generalized least squares
    Panels:        heteroskedastic with cross-sectional correlation
    Correlation:   common AR(1) coefficient for all panels  (0.5622)
      Estimated covariances      =        15          Number of obs      =       100
    Estimated autocorrelations =         1          Number of groups   =         5
    Estimated coefficients     =         7          Time periods       =        20
                                                    Wald chi2(6)       =    772.91
                                                    Prob > chi2        =    0.0000
      ------------------------------------------------------------------------------
          invest |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
          market |   .0812871   .0083048     9.79   0.000       .06501    .0975642
           stock |   .3483189    .029174    11.94   0.000     .2911389    .4054989
                 |
         company |
              2  |  -57.39275   48.66811    -1.18   0.238    -152.7805      37.995
              3  |  -242.6273   45.60671    -5.32   0.000    -332.0148   -153.2398
              4  |  -87.61307   49.60251    -1.77   0.077    -184.8322    9.606071
              5  |   87.52339   62.62926     1.40   0.162    -35.22771    210.2745
                 |
           _cons |   44.84548   51.68261     0.87   0.386    -56.45058    146.1415
    ------------------------------------------------------------------------------

    Comment


    • #3
      Clyde, thank you for the quick reply back. You clarified a a great deal. I'll respond in turn.

      1) You are right, the output I pasted was for the reduced model I ran. When I included the other controls, my sample size was reduced to around 450. The results are more or less the same, my Chi-square statistic is still high.

      2) & 3) Ok I think I understand now. So basically it makes sense in this case that the Chi-Square would be so large. With so many variables, there is virtually no chance that all coefficients are equal to zero.

      4) I included the indicators for company since I did not explicitly see anywhere that -xtgls- accounts for fixed effects (I did not xtset at a different level of aggregation). Thus for me, it made sense to include them. But, let me pose the following two relevant questions to you:

      a) If I run similar models in xtreg with indicators and option -fe-, Stata will automatically omit the indicators since they were indeed xtset. Why would this not be the case with -xtgls-? This is also one reason I kept the indicators. I figured if -xtgls- was indeed running fixed effects, it would have omitted the company indicators.

      b) If -xtgls- automatically runs a fixed effects specification, how does one then specify a random effects one?

      Thanks,
      Panos

      Comment


      • #4
        I was speaking loosely when I said earlier that -xtgls- implicitly absorbs the fixed effects. It does not really do that--which is why it also doesn't drop your explicit dummy variables. But it does account for the panel structure by calculating panel-specific residual variance structure. This way of dealing with the panel structure not the same thing, but it is satisfactory for most purposes, without throwing in panel-level indicators.

        If you want to specify a random effects model, use the -mixed- command. In particular, to capture most of what you are doing with your -xtgls- model, use the -residuals(ar1)- option to get the autoregressive structure to your residuals panel, and the -by()- suboption of to express heteroskedasticity.

        Comment


        • #5
          I have the same issue with very large Chi-Square values when I use FGLS (xtgls in stata). I have unbalanced panels. DV is logic transformation of a ratio measure. I would appreciate to hear your advice?
          Regards,



          Click image for larger version

Name:	Image.png
Views:	1
Size:	27.8 KB
ID:	1292164

          Comment


          • #6
            That chi square statistic is a test of the omnibus null hypothesis that all of the coefficients in the model (except the constant term) are zero. If you look at the z-tests in the coefficient table you will see that nearly all of them are large and some of them are truly enormous. So it is no surprise that the overall model chi square is very large.

            The real question is whether that chi square statistic is of any interest. That depends on your research question. In particular, are you really interested in testing the year-by-year variation or is it in the model as a source of nuisance variation that you want to control for but not study? If the latter , you want to use -test- (or -testparm-) to look at the joint significance of the other coefficients. (That chi square statistic will probably be very large, too, given the very large t-statistics of your c_* variables.)

            Comment


            • #7
              Thank you Clyde for the quick response and all your contributions to this blog. I am working with 6 DVs and some of them provide me with these huge Chi-Square values. They might not be that important for me. However, I will need to report them in my tables since they are being reported in journal articles using FGLS.

              I have other questions and that would be great if I can get your expert advice on those as well. I know that xtgls command with igls option can provide the Log Likelihood along with the Wald Chi-Square. Here comes 2 more questions:

              1- For some of my DVs, the igls never converges. What does that infer?

              2- I want to compare this model with a base model (only with controls) in terms of fit. What would be your general suggestion? In some publications I see that authors report Incremental change in Chi-Square and log likelihood at the end of FGLS table but I don't have any idea of how to implement that (if legitimate).
              Last edited by Pouya Tabesh; 27 Apr 2015, 16:58.

              Comment


              • #8
                Dear Clyde,

                I am analysing the effect of tech on trade flows.The panel dataset is strongly balanced, and I am using a gravity model framework, so fixed effect year and country fixed effect must be add in order to capture effect of country-specific factors and unobserved heterogeneity. The database include T=16, N=6 (countries) and 5 variables ( techno, gdp countries, geography distance and institutional distance). However I have the same issues as here were reported and when I remove year and country fixed effect those z became enormous. Before I was trying Pooled OLS but after the recommendation of Carlo, I checked it xtgls model and I understood Pooled is less efficient that panel gls but in a context of gravity framework I am not sure whether is compatible or not.

                Any suggestions will be very welcome, thanks in advance.






                Comment


                • #9
                  I'm not an economist and I have only cursory knowledge about the gravity model. What I have seen of it, which is exclusively here on Statalist, has been estimated with -xtreg, fe-. I can't think of any obvious reason why -xtgls- wouldn't be suitable with a small number of panels, but I can't tell you authoritatively that it's OK. There may be a reason that isn't obvious (to me). There are several Forum members who have much greater expertise in this area than I do, and I hope they will join the discussion here.

                  As for your large z-statistics, I don't know why you think of them as a problem. You have some very powerful predictors there in your model. If you have reason to believe that these are not realistic effect sizes (look at the coefficients, not the z-statistics) then you might want to check your data to see if you have some data errors that are distorting the results But, in general, I wouldn't be worried about z-statistics like this.

                  What does worry me about your output is that 3 of your importer indicator variables have been omitted instead of the usual 1. Why is that? They shouldn't exhibit colinearity with each other, nor with any of the other variables, should they? Similarly, it seems wrong that two of your year indicators have been omitted instead of the usual one. Again, it isn't clear why any of your year indicators should be colinear with anything else in this model. Perhaps your data has a lot of "holes" in it where certain exporters' data aren't available for certain years and there is colinearity among the importer and year indicators as a result? That might account for a pattern of omissions like this.

                  Comment


                  • #10
                    Dear Clyde,

                    Thank you very much for your comments, it helps a lot. Under a gravity framework, you are right, estimations via xtreg, fe are widely used however regress estimation are used as a benchmark estimation, that is why I used pooled ols, however, since it is less efficient I am trying xtgls.

                    So, first if in this case the z-statistics is not a problem why results have an enormous Wald Chi Square value?. Second, I also do not get why these dummies are omitted, because the dataset does not have any holes in it, which is the rare part, perhaps it is because the dataset only have one exporter and 6 importer.

                    Comment


                    • #11
                      The enormous Wald chi square value and the high z-statistics go together. Indeed, if the predictor variables were all completely independent of each other, the Wald chi square would be the sum of the squares of the z-statistics. In real life, there are correlations among predictors, so this relationship does not hold. But it gives you a basis for understanding that large z-statistics and large Wald chi square go together. Again, there is no reason to have concern about the Wald chi square statistic for the model based on its size. You should evaluate the credibility of the model based on its coefficients, and you should check the fit of the model to the data. But there is no "acceptable" range of model chi square statistics. It is whatever it is.

                      As for the omission of the indicator variables, it does suggest to me that something is wrong. Can you show a briefer example of the data that exhibits the same phenomenon (or a similar pattern of unexpected omissions even if not all of the omissions we see here)? Please use -dataex- to do that.

                      If you are running version 15.1 or a fully updated version 14.2, -dataex-is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.


                      Comment


                      • #12
                        Dear Clyde,

                        Thank you so much! Coefficient are not far from what I found on the literature relate to this research question. I am running the version 15.1. Here a sample

                        Code:
                        * Example generated by -dataex-. To install: ssc install dataex
                        clear
                        input float lnexp double lnintus float(lngdp lngdp_c)
                         20.22944  2.525735855102539  26.37296 27.817717
                        20.168425  3.251068353652954  26.31685 27.917883
                        19.037994  3.912243366241455  25.34863 28.010767
                         19.91865 4.3022379875183105 25.587696  28.13175
                        20.563097  4.762753009796143 25.934366  28.29461
                        end

                        Comment


                        • #13
                          Yes, those results do seem pretty concordant. Good!

                          Comment


                          • #14
                            Greeting everyone

                            I am running a Feasible GLS -xtgls- model with panel heteroskedasticity and AR(1) correlation
                            I am alreday run tow models (direct model and moderation model)

                            I get the value of wald Chi-Square in model one 2011 , and in model two 315 is this good?

                            How do I interpret those values?

                            Regards.


                            Comment


                            • #15
                              Dear Clyde,
                              I read through this post. Thank you for this great support. Just wondering if there is any rule of thumb type of values for Wald Chi2 and Prob > chi2 to test if model is working, I got Wald chi2(22) = 884.12 and Prob > chi2 = 0.0000 for xtset with random effect. I don't know if this if expected value.Thank you
                              Regards,
                              Bishal Bharadwaj

                              Comment

                              Working...
                              X