Lrtest - observations differ

Anabella Lamarche

Join Date: Jun 2019

Posts: 9
#1

Lrtest - observations differ

24 Jun 2019, 12:20

Hi,

I am trying to conduct a lrtest to assess the importance of incorporating additional variables. That being said, I am running into an error message which states "Observations differ :540456 vs. 542765". I have tried cleaning the data in an attempt to avoid this message but no changes I have made have been sufficient. Is there a way to restrict the observations (I am willing to sacrifice some observations) in order to be able to go forward with conducting an Lrtest?

Kind regards,
Anabella
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30147
#2

24 Jun 2019, 13:18

Evidently the estimation samples for your two models disagree. Probably when you add the additional variables, you lose observations that have missing values for one or more of those added variables. So the thing to do is to use the model with more variables first. Then restrict the second one to the estimation sample of the first. The scheme is like this:

Code:

regression command including the additional variables estimates store with_additional_vars regression command without the additional variables if e(sample) estimates store without_additional_vars lrtest with_additional_vars without_additional_vars
Comment
Stephan sepson

Join Date: Jul 2018

Posts: 6
#3

23 Sep 2020, 23:40

Dear Clyde,
I have quite the same problem to her, I am doing model using the lrt for finding the final model but the problem is I have an observations differ: 2402 vs. 2205 bc one of my vars was no case for that dependent var. Could you help me to solve that?
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4433
#4

24 Sep 2020, 00:13

Originally posted by Stephan sepson View Post

Could you help me to solve that?

Why wouldn't it be the same way as Clyde already showed, just reversing the order of the two models?

Or, given that you haven't shown anything specific, the following might be as near-foolproof as you could expect as a general preliminary data-management step.

Code:

ds foreach var of varlist `r(varlist)' { drop if mi(`var') }

Then proceed with your specification search.
Comment

Joro Kolev

Join Date: Aug 2018
Posts: 3050

24 Sep 2020, 06:42

Or alternatively, just use the force option of lrtest. Like this:

Code:

. sysuse auto, clear
(1978 Automobile Data)

. reg price mpg headroom

      Source |       SS           df       MS      Number of obs   =        74
-------------+----------------------------------   F(2, 71)        =     10.44
       Model |   144280501         2  72140250.4   Prob > F        =    0.0001
    Residual |   490784895        71  6912463.32   R-squared       =    0.2272
-------------+----------------------------------   Adj R-squared   =    0.2054
       Total |   635065396        73  8699525.97   Root MSE        =    2629.2

------------------------------------------------------------------------------
       price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         mpg |  -259.1057   58.42485    -4.43   0.000    -375.6015   -142.6098
    headroom |  -334.0215   399.5499    -0.84   0.406    -1130.701    462.6585
       _cons |   12683.31   2074.497     6.11   0.000     8546.885    16819.74
------------------------------------------------------------------------------

. est sto small

. reg price mpg headroom i.rep

      Source |       SS           df       MS      Number of obs   =        69
-------------+----------------------------------   F(6, 62)        =      3.74
       Model |   153228440         6  25538073.3   Prob > F        =    0.0031
    Residual |   423568519        62  6831750.31   R-squared       =    0.2657
-------------+----------------------------------   Adj R-squared   =    0.1946
       Total |   576796959        68  8482308.22   Root MSE        =    2613.8

------------------------------------------------------------------------------
       price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         mpg |  -297.1542   65.40836    -4.54   0.000    -427.9037   -166.4048
    headroom |  -334.6746   426.4413    -0.78   0.436    -1187.119    517.7695
             |
       rep78 |
          2  |   1389.807   2170.058     0.64   0.524    -2948.078    5727.692
          3  |   1873.314   1994.574     0.94   0.351    -2113.782     5860.41
          4  |   2114.149   2020.874     1.05   0.300     -1925.52    6153.819
          5  |     3505.7   2101.949     1.67   0.100    -696.0364    7707.436
             |
       _cons |   11390.42   2556.201     4.46   0.000     6280.646    16500.19
------------------------------------------------------------------------------

. est sto big

. lrtest small big
observations differ: 69 vs. 74
r(498);

. lrtest small big, force

Likelihood-ratio test                                 LR chi2(4)  =     98.06
(Assumption: small nested in big)                     Prob > chi2 =    0.0000

Comment

Stephan sepson

Join Date: Jul 2018

Posts: 6
#6

24 Sep 2020, 23:54

Thanks Joro
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4433
#7

25 Sep 2020, 00:33

Originally posted by Stephan sepson View Post

Thanks Joro

Really?

.ÿ
.ÿversionÿ16.1

.ÿ
.ÿclearÿ*

.ÿsysuseÿauto
(1978ÿAutomobileÿData)

.ÿ
.ÿregressÿpriceÿmpgÿheadroomÿi.rep

ÿÿÿÿÿÿSourceÿ|ÿÿÿÿÿÿÿSSÿÿÿÿÿÿÿÿÿÿÿdfÿÿÿÿÿÿÿMSÿÿÿÿÿÿNumberÿofÿobsÿÿÿ=ÿÿÿÿÿÿÿÿ69
-------------+----------------------------------ÿÿÿF(6,ÿ62)ÿÿÿÿÿÿÿÿ=ÿÿÿÿÿÿ3.74
ÿÿÿÿÿÿÿModelÿ|ÿÿÿ153228440ÿÿÿÿÿÿÿÿÿ6ÿÿ25538073.3ÿÿÿProbÿ>ÿFÿÿÿÿÿÿÿÿ=ÿÿÿÿ0.0031
ÿÿÿÿResidualÿ|ÿÿÿ423568519ÿÿÿÿÿÿÿÿ62ÿÿ6831750.31ÿÿÿR-squaredÿÿÿÿÿÿÿ=ÿÿÿÿ0.2657
-------------+----------------------------------ÿÿÿAdjÿR-squaredÿÿÿ=ÿÿÿÿ0.1946
ÿÿÿÿÿÿÿTotalÿ|ÿÿÿ576796959ÿÿÿÿÿÿÿÿ68ÿÿ8482308.22ÿÿÿRootÿMSEÿÿÿÿÿÿÿÿ=ÿÿÿÿ2613.8

------------------------------------------------------------------------------
ÿÿÿÿÿÿÿpriceÿ|ÿÿÿÿÿÿCoef.ÿÿÿStd.ÿErr.ÿÿÿÿÿÿtÿÿÿÿP>|t|ÿÿÿÿÿ[95%ÿConf.ÿInterval]
-------------+----------------------------------------------------------------
ÿÿÿÿÿÿÿÿÿmpgÿ|ÿÿ-297.1542ÿÿÿ65.40836ÿÿÿÿ-4.54ÿÿÿ0.000ÿÿÿÿ-427.9037ÿÿÿ-166.4048
ÿÿÿÿheadroomÿ|ÿÿ-334.6746ÿÿÿ426.4413ÿÿÿÿ-0.78ÿÿÿ0.436ÿÿÿÿ-1187.119ÿÿÿÿ517.7695
ÿÿÿÿÿÿÿÿÿÿÿÿÿ|
ÿÿÿÿÿÿÿrep78ÿ|
ÿÿÿÿÿÿÿÿÿÿ2ÿÿ|ÿÿÿ1389.807ÿÿÿ2170.058ÿÿÿÿÿ0.64ÿÿÿ0.524ÿÿÿÿ-2948.078ÿÿÿÿ5727.692
ÿÿÿÿÿÿÿÿÿÿ3ÿÿ|ÿÿÿ1873.314ÿÿÿ1994.574ÿÿÿÿÿ0.94ÿÿÿ0.351ÿÿÿÿ-2113.782ÿÿÿÿÿ5860.41
ÿÿÿÿÿÿÿÿÿÿ4ÿÿ|ÿÿÿ2114.149ÿÿÿ2020.874ÿÿÿÿÿ1.05ÿÿÿ0.300ÿÿÿÿÿ-1925.52ÿÿÿÿ6153.819
ÿÿÿÿÿÿÿÿÿÿ5ÿÿ|ÿÿÿÿÿ3505.7ÿÿÿ2101.949ÿÿÿÿÿ1.67ÿÿÿ0.100ÿÿÿÿ-696.0364ÿÿÿÿ7707.436
ÿÿÿÿÿÿÿÿÿÿÿÿÿ|
ÿÿÿÿÿÿÿ_consÿ|ÿÿÿ11390.42ÿÿÿ2556.201ÿÿÿÿÿ4.46ÿÿÿ0.000ÿÿÿÿÿ6280.646ÿÿÿÿ16500.19
------------------------------------------------------------------------------

.ÿestimatesÿstoreÿBig

.ÿ
.ÿregressÿpriceÿmpgÿheadroomÿifÿe(sample)

ÿÿÿÿÿÿSourceÿ|ÿÿÿÿÿÿÿSSÿÿÿÿÿÿÿÿÿÿÿdfÿÿÿÿÿÿÿMSÿÿÿÿÿÿNumberÿofÿobsÿÿÿ=ÿÿÿÿÿÿÿÿ69
-------------+----------------------------------ÿÿÿF(2,ÿ66)ÿÿÿÿÿÿÿÿ=ÿÿÿÿÿÿ8.98
ÿÿÿÿÿÿÿModelÿ|ÿÿÿ123364948ÿÿÿÿÿÿÿÿÿ2ÿÿ61682473.8ÿÿÿProbÿ>ÿFÿÿÿÿÿÿÿÿ=ÿÿÿÿ0.0004
ÿÿÿÿResidualÿ|ÿÿÿ453432011ÿÿÿÿÿÿÿÿ66ÿÿ6870181.99ÿÿÿR-squaredÿÿÿÿÿÿÿ=ÿÿÿÿ0.2139
-------------+----------------------------------ÿÿÿAdjÿR-squaredÿÿÿ=ÿÿÿÿ0.1901
ÿÿÿÿÿÿÿTotalÿ|ÿÿÿ576796959ÿÿÿÿÿÿÿÿ68ÿÿ8482308.22ÿÿÿRootÿMSEÿÿÿÿÿÿÿÿ=ÿÿÿÿ2621.1

------------------------------------------------------------------------------
ÿÿÿÿÿÿÿpriceÿ|ÿÿÿÿÿÿCoef.ÿÿÿStd.ÿErr.ÿÿÿÿÿÿtÿÿÿÿP>|t|ÿÿÿÿÿ[95%ÿConf.ÿInterval]
-------------+----------------------------------------------------------------
ÿÿÿÿÿÿÿÿÿmpgÿ|ÿÿ-243.1092ÿÿÿ59.10606ÿÿÿÿ-4.11ÿÿÿ0.000ÿÿÿÿ-361.1183ÿÿÿ-125.1002
ÿÿÿÿheadroomÿ|ÿÿ-288.1991ÿÿÿ406.4023ÿÿÿÿ-0.71ÿÿÿ0.481ÿÿÿÿ-1099.608ÿÿÿÿ523.2093
ÿÿÿÿÿÿÿ_consÿ|ÿÿÿÿ12186.4ÿÿÿ2096.566ÿÿÿÿÿ5.81ÿÿÿ0.000ÿÿÿÿÿ8000.472ÿÿÿÿ16372.33
------------------------------------------------------------------------------

.ÿestimatesÿstoreÿSmall

.ÿ
.ÿlrtestÿBigÿSmall

Likelihood-ratioÿtestÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿLRÿchi2(4)ÿÿ=ÿÿÿÿÿÿ4.70
(Assumption:ÿSmallÿnestedÿinÿBig)ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿProbÿ>ÿchi2ÿ=ÿÿÿÿ0.3194

.ÿ
.ÿquietlyÿestimatesÿrestoreÿBig

.ÿtestparmÿi.rep78

ÿ(ÿ1)ÿÿ2.rep78ÿ=ÿ0
ÿ(ÿ2)ÿÿ3.rep78ÿ=ÿ0
ÿ(ÿ3)ÿÿ4.rep78ÿ=ÿ0
ÿ(ÿ4)ÿÿ5.rep78ÿ=ÿ0

ÿÿÿÿÿÿÿF(ÿÿ4,ÿÿÿÿ62)ÿ=ÿÿÿÿ1.09
ÿÿÿÿÿÿÿÿÿÿÿÿProbÿ>ÿFÿ=ÿÿÿÿ0.3680

.ÿ
.ÿexit

endÿofÿdo-file

.

Given the Wald test results, which would you rely upon:
chi-square (df = 4) of 98, P < 0.00005 with -force-?

or chi-square (df = 4) of 4.7, P = 0.3 with the same estimation sample?
1 like
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#8

25 Sep 2020, 01:28

Joseph Coveney , you might be right that one should not use the force option, but for reasons unrelated to your argument in #7. I am interested to hear of any econometrics/statistics literature which shows what goes wrong when you do likelihood ratio testing and the samples under the restricted and unrestricted models are different. I am not aware of such literature.

As for your argument in # 7: You document a huge unexplained difference between using the force option and restricting the sample in the smaller model to what it is in the larger model. We certainly have something to think about here, something big is going on... But regarding the conclusions you draw from the dramatic situation you document:

1. If you trust so much your Wald test, why would you do a Likelihood Ratio test at all? Wald tests are always easier. In other words, using the Wald test result here as the appropriate benchmark is not justified.

2. Of course the Wald test and the Likelihood Ratio test on restricted sample will give you similar results. They are both carried out on the same restricted sample. The denominator degrees of freedom of the Wald test (F-test in fact) are 62, they come from the large model. So in a way by construction you make the Likelihood Ratio test similar to the Wald test by artificially restricting the sample. But making the LR test similar to the Wald test is not the same as making the LR test better.
Comment
Eric de Souza

Join Date: Mar 2014

Posts: 587
#9

25 Sep 2020, 02:54

The properties of the likelihood ratio test are based on the estimation of the likelihood function estimated first without restrictions and then with restrictions. The likelihood function is defined for a given sample. Otherwise, the two likelihood functions (with and without restrictions) are not comparable.
1 like
Comment

Announcement

Lrtest - observations differ

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment