mutli-level ordnial logit with imputed data?

Steven Saxonberg

Join Date: Jun 2016

Posts: 11
#1

mutli-level ordnial logit with imputed data?

11 Dec 2017, 10:47

Hello, I used the chain command for multiple imputed data. I have a multilevel model. Previously in STATA 13 I could run "mi estimate: meologit" without adding the "cmdok" command, but in STATA 14 and 15 it only works with "cmdok." Now my results are different than in STATA 13. Then I went back to STATA 13 and my results in STATA 13 also also similar to those in STATA 14 and 15 with the cmdok command, which means that somehow they changed in STATA 13 from before. I know that using cmdok is not reliable, but I do not know any way to get around the problem other than to delete missing cases rather than use imputed data, but that does not seem like a could alternative. I would be thankful for advise here.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30111
#2

11 Dec 2017, 11:00

Well, it is hard to know what has happened here. Remember that multiple imputation involves random sampling, and if you did not specify a random number seed, you can get different results when you try to replicate what you did.

-meologit- is not among the models that -mi estimate- supports. Generally speaking when Stata supports something for certain situations and not for others, it is usually because it is not actually applicable to the unsupported situations, or the methodology for doing it correctly has not been worked out. This isn't always the case, which is why Stata allows you to use the -cmdok- option, but they are basically warning you that you are wading into uncharted territory and they don't want to take responsibility for the results. I can't think of a good reason why -mi estimate- would not be valid as applied to -meologit-, but my knowledge of multiple imputation is rather superficial, so you shouldn't take my opinion on this too seriously. There are others on the forum who are more knowledgeable about MI and, hopefully, one of them will wade in here.

All of that said, if you want advice on potential alternatives to multiple imputation you will need to say much more about your project. The relevant issue is the missing data generation process: how does the missing data arise in your data set? Is missingness related to the actual unobserved values of the missing observations? If so, can you nevertheless condition out that association with observed information? If the answer to the first question is no, then you don't need to do anything in particular: a complete cases analysis will be unbiased. If the answer to the first question is no and the second answer is also no, then a complete cases analysis is biased, but a multiple imputation analysis may well be just as bad. In that situation, you probably need to either develop a model of the missing data for imputation that accounts for the irreducible dependency of missingness on the missing values themselves (which is a daunting task!) or find ways to actually observe some or all of the currently missing data (also a daunting task) or design some robustness analyses to assess just how severe missing data bias might be.
Comment
Steven Saxonberg

Join Date: Jun 2016

Posts: 11
#3

11 Dec 2017, 11:14

Thanks for the quick reply. I am using ESS data (European Social Survey Data), so the missing values means some respondents did not answer all the questions. Since there were many cases of missing data, I thought it would be best to impute the missing values. Yes, I used the rseed command for imputing. And last year I consistently got the same results. However, this year when returning to the data set, I get a different result than last year using the meologit command, but this new result is consistent. This new result is the same on 3 different computers, using 3 different versions of STATA (13, 14 and 15). So it is not changing often, it just changed once from last year, but this change keeps giving me the exact same new result (that is new compared to last year) regardless of which computer or version of STATA I am using. If it were a problem of the seed function, then I would expect the outcome to be different each time.
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5006
#4

11 Dec 2017, 11:15

To further complicate things the random number generator has changed across versions so it too has to be set if you want perfect replicability. See -help set rng-

Are these little tiny changes or really big ones? If tiny, my guess is that it is due to differences in the random generation process.

I do think it is interesting that meologit and xtologit are not listed as commands you can use mi estimate with. Makes me suspect there is a good reason for that, but I don't know what it is.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
Steven Saxonberg

Join Date: Jun 2016

Posts: 11
#5

11 Dec 2017, 11:25

Yes the change is extremely important, because one of the core arguments of the article was that those living in post-communist countries after the economic crisis became more supportive of income equality, but it went from being significant at the .001 level to not being significant at all. The change in the coefficients though are minor and the other varaibles did not change, but the most important variable for the article changed. The command I used was: mi impute chained (logit) gender (ologit) EQUAL subjinc edlevel lrscale, add(5) rseed(4409)
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30111
#6

11 Dec 2017, 11:28

Richard makes a good point. Reviewing the -help whatsnew13- file, the only thing I see that may be relevant here is that during the lifetime of version 13, the negative binomial random number generation function was changed. If some of your imputations involve that, that would explain why versions 13 through 15 all give you the same result, but a different result from what you previously got with version 13.

In terms of the missing data generation model, just knowing that this is non-response to survey questions is not sufficient to decide whether multiple imputation is necessary or even valid. It depends on the content of the survey questions. If the questions ask for sensitive information (illegal or stigmatized behavior, income, etc.) then missingness is likely to be fairly strongly associated with the unobserved response. For innocuous questions that is less likely. When dealing with survey non-response, it also matters whether the question is a standalone or part of a group of related items. When it is part of a group of related items, even if it asks about stigmatized behavior, it is plausible that conditioning on the responses to the other items in the group will mitigate the association between non-response and unobserved value.
Comment
Steven Saxonberg

Join Date: Jun 2016

Posts: 11
#7

11 Dec 2017, 11:54

Thanks for the replies. I too assume it must have to do with how STATA 13 was updated. Unfortunately, I have that program on a computer in a different country, so I will not be able to check it until the weekend. However, there were no negative beinomial random numbers.
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5006
#8

11 Dec 2017, 12:41

Assorted things.

I just fired up Stata 13.1. meologit is NOT listed as a command that works with mi estimate. Maybe it worked (are you sure it did?) but it doesn't look like it was supposed to.

The March 10, 2016 update for Stata 13 says "when used with datasets that had many observations per group, [meologit] could indicate that the model converged and report results that did not include the large groups in the computations." There are other messages about bug fixes with meologit. Maybe you got the "good" results you like with an early version of Stata 13 that was buggy. But also make sure you really are running the exact same thing will all three versions of Stata.

The whatsnew files may also have info in them about fixes or changes in the mi commands.

If you want to take one last crack at replicating those "good" results, in Stata 15 or 14 try

set rng kiss32
set seed whateveritwaswhenyoufirstranit

My own guess is that the results you are getting now (which you don't like) are the correct ones. Or, that there is some difference from your earlier analyses (changes in the data or model) that you are overlooking.

When you do get back to that old Stata 13 machine, don't update it! Instead see if it gives you the "good" results again, and if so see if you can figure out why.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
1 like
Comment
Steven Saxonberg

Join Date: Jun 2016

Posts: 11
#9

12 Dec 2017, 11:32

Dear Richard, thanks for you comments, but my STATA 13 has long since been updated and during the weekend when I used the updated version of STATA 13, it gave me the same results as STATA 14 and 15.
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5006
#10

12 Dec 2017, 11:43

Do you have the output and commands from when you got the different results? I strongly suspect something is different now and you don't realize it. For example, I'd be surprised if Stata 13 ever supported mi estimate with meologit without using the cmdok option.

Either that, or Stata fixed some bug. I don't think bugs are restored under version control, so even if you did get those results before you probably can't replicate them. Unless maybe you restore Stata 13 from scratch and don't install updates.

In any event, even if you did get different results before, my guess is that the results you are getting now are the correct ones.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5006
#11

12 Dec 2017, 11:48

Also, the whatsnew file for Stata 13 indicates various bug fixes were made to mi impute chained. So that might account for discrepancies with your earlier results.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment

Announcement

mutli-level ordnial logit with imputed data?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment