Maximum Likelihood for Model Selection- Simulation Tests return non-zero Likelihood for true Model...

Thomas Schmidt

Join Date: Aug 2014

Posts: 6
#1

Maximum Likelihood for Model Selection- Simulation Tests return non-zero Likelihood for true Model...

07 Aug 2014, 07:27

Dear Statalist-Members,

I intended to use maximum likelihood to select among several models using the result of the maximized likelihood in an Akaike Information Criterion but experience some problems - probably due to some sort of defect in the likelihood function. I already have some ideas what went wrong but would like to discuss this matter with someone who is a bit more competent than me.

In detail, i tried to fit parameters of several competing binary choice models of consumer behavior using the ml model command, for which i wrote the respective programs according to the descriptions in Gould, Pitblado and Poi "Maximum Likelihood Estimation in Stata", 3rd. ed. To test whether my code was correct, i constructed a file in Stata, that simulates a customers behavior for each model under consideration by generating observations, which were used in my ml file for estimating the parameters of the respective models. The additional purpose of this simulation was to check whether i would be able to identify the true underlying model and recover its parameters.

According to theory, the true model should return a likelihood that is exactly zero with parameters identical with the parameters, that were used to generate these observations. Unfortunately, neither the true parameters used for the simulation can be retreived nor is the likelihood near zero.. in fact it ranges close to the likelihood if observations were generated randomly. If i plug in the true parameters into the ml code, i get a likelihood pretty close to zero, but once i start the maximize command, things fall appart..

My question is now: For the poor estimates of my parameters, i think i can blame multicollinearity, but this should still yield a log-likelihood of zero (except the search algorithm stops due to insufficient small slope of the score vector, but i dont think so as the maximized likelihood is so negative) Do you have an idea what i can do and what might drive my results?

Please let me know if you need parts of the code or mode information concerning the problem.

Many thanks for considering!!!
Thomas
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30152
#2

07 Aug 2014, 08:27

According to theory, the true model should return a likelihood that is exactly zero with parameters identical with the parameters, that were used to generate these observations.

I don't think that's true, at least not in general. Let's take a very simple example. Suppose the true model is y ~ N(0, 1), and we fit a model y ~ N(mu, sigma) estimating mu and sigma by maximum likelihood. And let's suppose that the data are actually generated by sampling from a standard normal distribution. Even assuming the optimizer does land on the 0, 1 parameter estimates (which it should at least come very close to doing), the likelihood for each observation is not going to be 1, it's going to be the standard normal density evaluated at y. So the total log likelihood will not, in general, be zero. Even if you remove the randomness from the test data y, and use as test data an all-zero vector, the likelihood for each of those identical observations will be the normal density evaluated at zero, which is roughly 0.4. So the overall log likelihood will be N*ln(0.4), not zero.

Perhaps you're dealing with some special class of models where what you said is true, but it's hard for me to figure what that might be.
1 like
Comment
Thomas Schmidt

Join Date: Aug 2014

Posts: 6
#3

07 Aug 2014, 08:42

Dear Clyde,

thank you very much for your post. I also received some feedback from economist who argued that any result that doesn't result in a zero log-likelihood should be considered erroneous..

How can i provide more details?

My structure looks like this:

I think its important to say that (according to your notation) y is a binary variable and the model, that generates them is designed as difference between two functions G (for which i try to unravel the parameters , say a and b) based on consumption of goods 1 and 2, denoted as c1 and c2, which are functions of past consumption c_o:

G(c1(c_o);a,b)-G(c2(c_o);a,b)) + some iid error term.

Whenever this expression is >0, i assigned y=1, else 0.

Then i used in my ml file the

F(-/+G(c1(c_o);a#,b#)-G(c2(c_o);a#,b#))),

where F is a Cumulative Normal Distribution and a# , b# are the tempvariables i used for the ml to put in the estimates for a and b..

Additionally , i also try to get the parameters of the error term to make sure it dosn't interact with my estimates for a and b.

Sorry for this lengthy and a bit messy writing, but i think thats the most-high-level description that i can give you. For details please write me and i send you further infos..

Best Regards
Thomas

PS: I think i forgot to write (like in other posts) that i'm using a Stata 10.1 version.. if that matters (perhaps differing wrt search algorithm of later versions..)

Last edited by Thomas Schmidt; 07 Aug 2014, 08:54.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30152
#4

07 Aug 2014, 08:50

Well, not knowing the particular economist who advised you, I'm in a poor position to say what he/she might find persuasive. You might illustrate the principle with the same example I gave you and ask how that fits with his/her understanding, or ask the economist for a reference to support his/her claim.
Comment
Thomas Schmidt

Join Date: Aug 2014

Posts: 6
#5

07 Aug 2014, 09:29

Thanks again for your reply and for the example. Your example presupposes that all observations for the deterministic case are equal to their means which is not the case in my Simulations (see. my edited post above.)
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30152
#6

07 Aug 2014, 10:05

The deterministic example, with a vector of all zeroes, was just intended to show that even if the data happened to, in each observation, provide maximum likelihood, you still don't get log likelihood zero. Data with actual variation will, of course, produced an even lower likelihood.
Comment
Thomas Schmidt

Join Date: Aug 2014

Posts: 6
#7

08 Aug 2014, 02:44

Dear Clyde,
Thank you for your help in this matter. I see that. Yet, i'm still puzzled why my estimators for the binary choice model as described above are not able to match their true counterpart while returning a negative likelihood, although Stata reports successful convergence. I was wondering whether the reason might be that whenever y=1, my difference in consumption-function G(c1(c_o);a,b)-G(c2(c_o);a,b)) contains the same arguments c_o..
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2192
#8

08 Aug 2014, 11:10

Thomas: Clyde is correct. You should not get a zero log likelihood. I think you are misunderstanding what it means to have randomness. Clyde gave you a simple example but the general point holds. Go back to a probit model or logit model and look at the log likelihood for any particular observation. The value is either log[Fi(b)] or log[1 - Fi(b)], depending on whether yi is one or zero. So for each i, the log likelihood is negative -- no matter what parameters you put in there, the true ones or not. Further, except by fluke, a particular data set will not deliver the true parameters as the MLEs. That's why it is an estimation problem! Even though you generate the data using the true parameters there will still be sampling error. By your argument, if I generate data from a normal distribution with a mean of zero then the sample average I compute should always be zero. This obviously is not true. JW
1 like
Comment
Thomas Schmidt

Join Date: Aug 2014

Posts: 6
#9

11 Aug 2014, 03:32

Dear Jeff,

thank you for your message. I see. Ok, just to check whether i got this right- the non-zero Log-likelihoods are caused my the finite sample size as the ml features are based on the assumption of infinitely large sample size, in this case my estimates would converge to the true ones?

Sorry for asking this kind of questions, i'm new to the field of econometrics..
Sven
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2603
#10

11 Aug 2014, 07:54

Take again Jeff's example of the Probit model. Under independence of your observations, the log likelihood for the full sample is the sum of the log likelihoods for the individual observations:
ll = sum [yi * log Fi(xb) + (1 - yi) * log (1 - Fi(xb))]

The dependent variable yi is either 0 or 1 and Fi(xb) is always between 0 or 1, no matter how your data looks like and also if you know the true coefficients. Therefore, the log of Fi(xb) or (1 - Fi(xb)) is always negative. Consequently, you are summing up only negative values which gives you a negative value of the overall log likelihood. This has nothing to do with your sample size. If you increase your sample size, you just add more and more negative values which does not bring you back to zero eventually.

Side note: While with the above Probit model it is true that the log likelihood is always negative, that does not hold in general. For a simple linear regression model, y = xb + u, the log likelihood can easily become positive as well (and in principal, just by chance, even zero). However, I do not see any reason why a zero log likelihood should deserve special attention.

https://www.kripfganz.de/stata/
Comment
Thomas Schmidt

Join Date: Aug 2014

Posts: 6
#11

12 Aug 2014, 02:54

Dear Sebastian, dear Jeff,

thanks for your clarifications. I really appreciate the help i receive here. Ok, i see that. But this rises another questionmark: According to your notation, Fi(xb) is the cumulative normal distribution, given that the error term is normally distributed? In more formal terms, is integrate in case of Fi(xb) from -infinity to (xb-0/sigma), where sigma is the sd-deviation of the error. If i have an error term with positive sigma, as it is the case under randomness, then Fi(xb) is below one, which is the point you´re making- at least thats how i understood it. If i use a deterministic model, say i generate (non)random observations using an exact function of y=xb, then sigma is zero and the values for Fi(xb) depend on whether xb is positive or negative, yielding fi(xb) approaching/ being exactly(?) 0 or 1. I think this could also explain why i get strange parameter estimates for b as it doesn't matter which b's i plug in Fi(xb)? Whether its clever to use probit given this deterministic process is i think another question..
Thanks for your patience with me
Thomas
Comment

Announcement

Maximum Likelihood for Model Selection- Simulation Tests return non-zero Likelihood for true Model...

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment