What stats are appropriate to assess model fit for nested logit models if Wald is not possible for clustered data?

Guest
#1

What stats are appropriate to assess model fit for nested logit models if Wald is not possible for clustered data?

30 Nov 2018, 20:42

I used vce(cluster) to account for clustering within 9 groups in a set of nested logistic regression models. Stata doesn't want to give me Wald chi2 stats because I have too many variables in the model in relation to # of clusters, and used up my df. Stata also said both Wald and lrtest would be misleading. So, what *wouldn't* be misleading to report to describe fit and compare fit among nested models? Are pseudo-R square, AIC, BIC, and log likelihood #s still meaningful to interpret? Or are there other stats I don't know about?

Thanks in advance!
Tags: None
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#2

30 Nov 2018, 21:14

In order to entice a more clarifying reply, the best approach is presenting command/output and information about the data display.

That being said, "too many variables" is a problem, it will prompt to overfitting in the best scenario,and such a problem can be solved at best by the parsimony criteria.

Best regards,

Marcos
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#3

30 Nov 2018, 21:26

To Marcos' excellent advice, I would add my observation that using cluster-robust vce with only 9 clusters is probably inappropriate in the first place. It is universally agreed that the cluster robust VCE is only valid with a "large" number of clusters. While there is no universal agreement about just how many clusters suffice, I think nearly everyone would agree that 9 are not enough. If you drop the -vce(cluster)- and use the ordinary VCE, this problem will solve itself, and you will likely have more valid variance estimation than the clustered version as well.
Comment

Guest

30 Nov 2018, 21:28

Sure, here you go:

Code:

logistic h_housingcostneed c_intakeyear i.under65 i.c_kids i.pooreng e_PREincome100s i.nosavings i.h_anyresp i.preSalary i.insurance4, vce(cluster c_interviewlocation)

Logistic regression                               Number of obs   =       1113
                                                  Wald chi2(7)    =          .
                                                  Prob > chi2     =          .
Log pseudolikelihood = -665.32757                 Pseudo R2       =     0.0802

                                      (Std. Err. adjusted for 9 clusters in c_interviewlocation)

Thanks again!

Comment

Guest
#5

30 Nov 2018, 21:33

Hi Clyde, that is good to know! By ordinary, do you mean that vce(robust) Huber/White/sandwich estimator would be a more appropriate way to handle the 9 clusters? Thank you both for your help!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#6

30 Nov 2018, 22:43

No. -vce(robust)- is not valid with clustered data. I mean don't specify -vce()- at all; go with the default variance estimator that is based on the information matrix from the maximum likelihood estimation.
Comment
Guest
#7

01 Dec 2018, 10:22

Oh! OK, thanks for clarifying. I was thinking of multilevel modeling initially (which I'm not very familiar with) but didn't have a very high ICC. You've been so helpful, Clyde! Appreciate your advice!
Comment

Announcement

What stats are appropriate to assess model fit for nested logit models if Wald is not possible for clustered data?

Comment

Comment

Comment

Comment

Comment

Comment