Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Lrtest - observations differ

    Hi,

    I am trying to conduct a lrtest to assess the importance of incorporating additional variables. That being said, I am running into an error message which states "Observations differ :540456 vs. 542765". I have tried cleaning the data in an attempt to avoid this message but no changes I have made have been sufficient. Is there a way to restrict the observations (I am willing to sacrifice some observations) in order to be able to go forward with conducting an Lrtest?

    Kind regards,
    Anabella

  • #2
    Evidently the estimation samples for your two models disagree. Probably when you add the additional variables, you lose observations that have missing values for one or more of those added variables. So the thing to do is to use the model with more variables first. Then restrict the second one to the estimation sample of the first. The scheme is like this:

    Code:
    regression command including the additional variables
    estimates store with_additional_vars
    
    regression command without the additional variables if e(sample)
    estimates store without_additional_vars
    
    lrtest with_additional_vars without_additional_vars

    Comment


    • #3
      Dear Clyde,
      I have quite the same problem to her, I am doing model using the lrt for finding the final model but the problem is I have an observations differ: 2402 vs. 2205 bc one of my vars was no case for that dependent var. Could you help me to solve that?

      Comment


      • #4
        Originally posted by Stephan sepson View Post
        Could you help me to solve that?
        Why wouldn't it be the same way as Clyde already showed, just reversing the order of the two models?

        Or, given that you haven't shown anything specific, the following might be as near-foolproof as you could expect as a general preliminary data-management step.
        Code:
        ds
        foreach var of varlist `r(varlist)' {
           drop if mi(`var')
        }
        Then proceed with your specification search.

        Comment


        • #5
          Or alternatively, just use the force option of lrtest. Like this:

          Code:
          . sysuse auto, clear
          (1978 Automobile Data)
          
          . reg price mpg headroom
          
                Source |       SS           df       MS      Number of obs   =        74
          -------------+----------------------------------   F(2, 71)        =     10.44
                 Model |   144280501         2  72140250.4   Prob > F        =    0.0001
              Residual |   490784895        71  6912463.32   R-squared       =    0.2272
          -------------+----------------------------------   Adj R-squared   =    0.2054
                 Total |   635065396        73  8699525.97   Root MSE        =    2629.2
          
          ------------------------------------------------------------------------------
                 price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
          -------------+----------------------------------------------------------------
                   mpg |  -259.1057   58.42485    -4.43   0.000    -375.6015   -142.6098
              headroom |  -334.0215   399.5499    -0.84   0.406    -1130.701    462.6585
                 _cons |   12683.31   2074.497     6.11   0.000     8546.885    16819.74
          ------------------------------------------------------------------------------
          
          . est sto small
          
          . reg price mpg headroom i.rep
          
                Source |       SS           df       MS      Number of obs   =        69
          -------------+----------------------------------   F(6, 62)        =      3.74
                 Model |   153228440         6  25538073.3   Prob > F        =    0.0031
              Residual |   423568519        62  6831750.31   R-squared       =    0.2657
          -------------+----------------------------------   Adj R-squared   =    0.1946
                 Total |   576796959        68  8482308.22   Root MSE        =    2613.8
          
          ------------------------------------------------------------------------------
                 price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
          -------------+----------------------------------------------------------------
                   mpg |  -297.1542   65.40836    -4.54   0.000    -427.9037   -166.4048
              headroom |  -334.6746   426.4413    -0.78   0.436    -1187.119    517.7695
                       |
                 rep78 |
                    2  |   1389.807   2170.058     0.64   0.524    -2948.078    5727.692
                    3  |   1873.314   1994.574     0.94   0.351    -2113.782     5860.41
                    4  |   2114.149   2020.874     1.05   0.300     -1925.52    6153.819
                    5  |     3505.7   2101.949     1.67   0.100    -696.0364    7707.436
                       |
                 _cons |   11390.42   2556.201     4.46   0.000     6280.646    16500.19
          ------------------------------------------------------------------------------
          
          . est sto big
          
          . lrtest small big
          observations differ: 69 vs. 74
          r(498);
          
          . lrtest small big, force
          
          Likelihood-ratio test                                 LR chi2(4)  =     98.06
          (Assumption: small nested in big)                     Prob > chi2 =    0.0000

          Comment


          • #6
            Thanks Joro

            Comment


            • #7
              Originally posted by Stephan sepson View Post
              Thanks Joro
              Really?

              .ÿ
              .ÿversionÿ16.1

              .ÿ
              .ÿclearÿ*

              .ÿsysuseÿauto
              (1978ÿAutomobileÿData)

              .ÿ
              .ÿregressÿpriceÿmpgÿheadroomÿi.rep

              ÿÿÿÿÿÿSourceÿ|ÿÿÿÿÿÿÿSSÿÿÿÿÿÿÿÿÿÿÿdfÿÿÿÿÿÿÿMSÿÿÿÿÿÿNumberÿofÿobsÿÿÿ=ÿÿÿÿÿÿÿÿ69
              -------------+----------------------------------ÿÿÿF(6,ÿ62)ÿÿÿÿÿÿÿÿ=ÿÿÿÿÿÿ3.74
              ÿÿÿÿÿÿÿModelÿ|ÿÿÿ153228440ÿÿÿÿÿÿÿÿÿ6ÿÿ25538073.3ÿÿÿProbÿ>ÿFÿÿÿÿÿÿÿÿ=ÿÿÿÿ0.0031
              ÿÿÿÿResidualÿ|ÿÿÿ423568519ÿÿÿÿÿÿÿÿ62ÿÿ6831750.31ÿÿÿR-squaredÿÿÿÿÿÿÿ=ÿÿÿÿ0.2657
              -------------+----------------------------------ÿÿÿAdjÿR-squaredÿÿÿ=ÿÿÿÿ0.1946
              ÿÿÿÿÿÿÿTotalÿ|ÿÿÿ576796959ÿÿÿÿÿÿÿÿ68ÿÿ8482308.22ÿÿÿRootÿMSEÿÿÿÿÿÿÿÿ=ÿÿÿÿ2613.8

              ------------------------------------------------------------------------------
              ÿÿÿÿÿÿÿpriceÿ|ÿÿÿÿÿÿCoef.ÿÿÿStd.ÿErr.ÿÿÿÿÿÿtÿÿÿÿP>|t|ÿÿÿÿÿ[95%ÿConf.ÿInterval]
              -------------+----------------------------------------------------------------
              ÿÿÿÿÿÿÿÿÿmpgÿ|ÿÿ-297.1542ÿÿÿ65.40836ÿÿÿÿ-4.54ÿÿÿ0.000ÿÿÿÿ-427.9037ÿÿÿ-166.4048
              ÿÿÿÿheadroomÿ|ÿÿ-334.6746ÿÿÿ426.4413ÿÿÿÿ-0.78ÿÿÿ0.436ÿÿÿÿ-1187.119ÿÿÿÿ517.7695
              ÿÿÿÿÿÿÿÿÿÿÿÿÿ|
              ÿÿÿÿÿÿÿrep78ÿ|
              ÿÿÿÿÿÿÿÿÿÿ2ÿÿ|ÿÿÿ1389.807ÿÿÿ2170.058ÿÿÿÿÿ0.64ÿÿÿ0.524ÿÿÿÿ-2948.078ÿÿÿÿ5727.692
              ÿÿÿÿÿÿÿÿÿÿ3ÿÿ|ÿÿÿ1873.314ÿÿÿ1994.574ÿÿÿÿÿ0.94ÿÿÿ0.351ÿÿÿÿ-2113.782ÿÿÿÿÿ5860.41
              ÿÿÿÿÿÿÿÿÿÿ4ÿÿ|ÿÿÿ2114.149ÿÿÿ2020.874ÿÿÿÿÿ1.05ÿÿÿ0.300ÿÿÿÿÿ-1925.52ÿÿÿÿ6153.819
              ÿÿÿÿÿÿÿÿÿÿ5ÿÿ|ÿÿÿÿÿ3505.7ÿÿÿ2101.949ÿÿÿÿÿ1.67ÿÿÿ0.100ÿÿÿÿ-696.0364ÿÿÿÿ7707.436
              ÿÿÿÿÿÿÿÿÿÿÿÿÿ|
              ÿÿÿÿÿÿÿ_consÿ|ÿÿÿ11390.42ÿÿÿ2556.201ÿÿÿÿÿ4.46ÿÿÿ0.000ÿÿÿÿÿ6280.646ÿÿÿÿ16500.19
              ------------------------------------------------------------------------------

              .ÿestimatesÿstoreÿBig

              .ÿ
              .ÿregressÿpriceÿmpgÿheadroomÿifÿe(sample)

              ÿÿÿÿÿÿSourceÿ|ÿÿÿÿÿÿÿSSÿÿÿÿÿÿÿÿÿÿÿdfÿÿÿÿÿÿÿMSÿÿÿÿÿÿNumberÿofÿobsÿÿÿ=ÿÿÿÿÿÿÿÿ69
              -------------+----------------------------------ÿÿÿF(2,ÿ66)ÿÿÿÿÿÿÿÿ=ÿÿÿÿÿÿ8.98
              ÿÿÿÿÿÿÿModelÿ|ÿÿÿ123364948ÿÿÿÿÿÿÿÿÿ2ÿÿ61682473.8ÿÿÿProbÿ>ÿFÿÿÿÿÿÿÿÿ=ÿÿÿÿ0.0004
              ÿÿÿÿResidualÿ|ÿÿÿ453432011ÿÿÿÿÿÿÿÿ66ÿÿ6870181.99ÿÿÿR-squaredÿÿÿÿÿÿÿ=ÿÿÿÿ0.2139
              -------------+----------------------------------ÿÿÿAdjÿR-squaredÿÿÿ=ÿÿÿÿ0.1901
              ÿÿÿÿÿÿÿTotalÿ|ÿÿÿ576796959ÿÿÿÿÿÿÿÿ68ÿÿ8482308.22ÿÿÿRootÿMSEÿÿÿÿÿÿÿÿ=ÿÿÿÿ2621.1

              ------------------------------------------------------------------------------
              ÿÿÿÿÿÿÿpriceÿ|ÿÿÿÿÿÿCoef.ÿÿÿStd.ÿErr.ÿÿÿÿÿÿtÿÿÿÿP>|t|ÿÿÿÿÿ[95%ÿConf.ÿInterval]
              -------------+----------------------------------------------------------------
              ÿÿÿÿÿÿÿÿÿmpgÿ|ÿÿ-243.1092ÿÿÿ59.10606ÿÿÿÿ-4.11ÿÿÿ0.000ÿÿÿÿ-361.1183ÿÿÿ-125.1002
              ÿÿÿÿheadroomÿ|ÿÿ-288.1991ÿÿÿ406.4023ÿÿÿÿ-0.71ÿÿÿ0.481ÿÿÿÿ-1099.608ÿÿÿÿ523.2093
              ÿÿÿÿÿÿÿ_consÿ|ÿÿÿÿ12186.4ÿÿÿ2096.566ÿÿÿÿÿ5.81ÿÿÿ0.000ÿÿÿÿÿ8000.472ÿÿÿÿ16372.33
              ------------------------------------------------------------------------------

              .ÿestimatesÿstoreÿSmall

              .ÿ
              .ÿlrtestÿBigÿSmall

              Likelihood-ratioÿtestÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿLRÿchi2(4)ÿÿ=ÿÿÿÿÿÿ4.70
              (Assumption:ÿSmallÿnestedÿinÿBig)ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿProbÿ>ÿchi2ÿ=ÿÿÿÿ0.3194

              .ÿ
              .ÿquietlyÿestimatesÿrestoreÿBig

              .ÿtestparmÿi.rep78

              ÿ(ÿ1)ÿÿ2.rep78ÿ=ÿ0
              ÿ(ÿ2)ÿÿ3.rep78ÿ=ÿ0
              ÿ(ÿ3)ÿÿ4.rep78ÿ=ÿ0
              ÿ(ÿ4)ÿÿ5.rep78ÿ=ÿ0

              ÿÿÿÿÿÿÿF(ÿÿ4,ÿÿÿÿ62)ÿ=ÿÿÿÿ1.09
              ÿÿÿÿÿÿÿÿÿÿÿÿProbÿ>ÿFÿ=ÿÿÿÿ0.3680

              .ÿ
              .ÿexit

              endÿofÿdo-file


              .


              Given the Wald test results, which would you rely upon:
              chi-square (df = 4) of 98, P < 0.00005 with -force-?
              or chi-square (df = 4) of 4.7, P = 0.3 with the same estimation sample?

              Comment


              • #8
                Joseph Coveney , you might be right that one should not use the force option, but for reasons unrelated to your argument in #7. I am interested to hear of any econometrics/statistics literature which shows what goes wrong when you do likelihood ratio testing and the samples under the restricted and unrestricted models are different. I am not aware of such literature.

                As for your argument in # 7: You document a huge unexplained difference between using the force option and restricting the sample in the smaller model to what it is in the larger model. We certainly have something to think about here, something big is going on... But regarding the conclusions you draw from the dramatic situation you document:

                1. If you trust so much your Wald test, why would you do a Likelihood Ratio test at all? Wald tests are always easier. In other words, using the Wald test result here as the appropriate benchmark is not justified.

                2. Of course the Wald test and the Likelihood Ratio test on restricted sample will give you similar results. They are both carried out on the same restricted sample. The denominator degrees of freedom of the Wald test (F-test in fact) are 62, they come from the large model. So in a way by construction you make the Likelihood Ratio test similar to the Wald test by artificially restricting the sample. But making the LR test similar to the Wald test is not the same as making the LR test better.



                Comment


                • #9
                  The properties of the likelihood ratio test are based on the estimation of the likelihood function estimated first without restrictions and then with restrictions. The likelihood function is defined for a given sample. Otherwise, the two likelihood functions (with and without restrictions) are not comparable.

                  Comment

                  Working...
                  X