Multiple imputation - MICE - panel data

Guest
#1

Multiple imputation - MICE - panel data

10 Apr 2018, 03:27

Hi all
I have a panel data, with the difference that instead of repeated observations over time, I have repeated observations of attributes over three different products. I have some missing value on three attributes for each product (9 variables that has missing values in total). I am using multiple imputation, with mi impute chained (MICE) method, and the model I am using for imputation is logit, since all of my variables are binary. All the models converge and I do not see any problem in the patterns of the mean and sd of the imputed variables, the imputation is done successfully as well. However, the mi estimate of the model of interest for my thesis, which is -xtintreg- (I use -mi estimate, cmdok: xtintreg-) gives extremely large coefficients! between 2000 up to 6000! while I am estimating marginal willingness to pay of a product which at highest costs 26$. I tried both with long data and wide data, the results are having the same problem.
Anyone can guess where the problem could be?
Best regards,
Guest

Last edited by sladmin; 30 Apr 2020, 07:31. Reason: anonymize original poster
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30118
#2

10 Apr 2018, 08:42

I think you need to show, at the least, the complete output. Coefficients like 2000 or 6000 seem surprising, but might not be if, for example, the constant term is -1975 or -5975. I wonder also if perhaps one of your variables is scaled incorrectly (e.g. denominated in cents rather than dollars).

Most important, what kind of results did you get just running an unimputed analysis on complete cases? I realize those results may be biased, but it is almost inconceivable that they will be biased by a factor of 100. The mi-estimated outputs should be of the same order of magnitude as the complete case outputs.

I also hasten to point out that Rubin's rules have been proven to work for certain types of procedures. Whether they work with xtintreg is unclear. The fact that you need the -cmdok- option (and hence that StataCorp does not directly support this usage) suggests that you may be sailing in uncharted waters here. (Though, again, I would not expect the results to be off by two orders of magnitude.)
1 like
Comment
Guest
#3

11 Apr 2018, 01:04

Dear Clyde
Thank you for your response.
I have attached the output for both the estimation with/ and without imputation.
The coefficients are absolutely surprising, however as you said the constant terms are as large too. (I used nocons option and instead f1 f2 f3 are the constant terms for each equation). In terms of order of magnitude and significance they are as expected, however, it is impossible that the coefficient of the un-imputed model are to this extend biased since the dependent variable is WTP and these are all marginal WTPs for each product.
And as you also pointed, I know that xtinreg and even intreg is not supported but what I am wondering is whether these unexpected results are because of this fact or did I do something wrong?
Moreover, if I did not do anything wrong, and since the results are almost corresponding in terms of order of magnitude and significance, should I skip imputation? Because I do not know how these results with these coefficients should be reported?
With my sincere appreciation
Guest
Attached Files

Last edited by sladmin; 30 Apr 2020, 07:31. Reason: anonymize original poster
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30118
#4

11 Apr 2018, 09:30

Well, I have to admit I'm in a little over my head here. Here are some thoughts, for what they're worth:

1. While I remain skeptical that Rubin's rules can be used here, I really find it hard to believe that this can account for a discrepancy of this magnitude.

2. You don't have nearly enough imputations. Your FMI is 0.6283. That would imply that you should use about 63 imputations. The early recommendation imputations would suffice for almost anything have long been upended. The current recommendation is about 1 imputation for each percentage point of FMI.

3. The small number of imputations may also be responsible for this in another way. You don't show your imputation model, and I don't know what these variables are, but if you have even one imputation which is just a weird statistical outlier (which can happen: it is a random process, after all) it may be throwing off the overall results after they are averaged in Rubin's rules. If you had 65 imputations, the effect of a single weird one would be diluted out. One way to see if this is happening is to rerun the -mi estimate:...- with the -noisily- option. That way you will see the results of the 5 individual regressions. If one of them is bizarre and the others are more normal, then this would be your explanation of the problem. That said, other than "diluting out" that effect by using a more appropriately large number of imputations, I don't know of a solution.
1 like
Comment
Guest
#5

13 Apr 2018, 04:36

1. I think it might be the fact that Rubin's rule cannot be used here for the xtintreg/intreg commands , since the imputed results of intreg/xtintreg are both equally strange. this is while, in case where I do not define the intervals for the dependent variable, and run a simple regression -reg- , the magnitude of the coefficients are reasonable! (of course simple regression is absolutely not proper model for my data, however, that can indicate that the problem comes from the unsupported intreg/xtintreg command).

2. I have almost 30% of missing observations, and initially I did 30 imputations initially, and the results were equally strange. (Since I had not saved the output I ran it with 5 imputation for sake of making output to attach to this conversation)

3. All of my explanatory variables are binary. Therefore my imputation model for them are logit. I have checked for multicollinearity, then imputed both with augmented logit, and also tried running three different imputation for the variables that belong to one product. I would not believe that is due to an outlier, moreover, I ran the imputation process several times with different seeds. In all the cases the results are strange and I did not find a strange model in noisily imputation. In all cases, when I do not define five intervals for my dependent variable and use the raw values themselves with -regress- command, the coefficients look like normal output table that one would expect generally from any model.

In conclusion, I would guess that the problem is inappropriateness of the xtintreg/intreg estimation here.
Given that my data seems to be MAR rather than MCAR, do you recommend any other solution to handle the missing observations?

Kindest regards,
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30118
#6

13 Apr 2018, 08:28

Well, you've explored everything I can think of. I don't have any other ideas. I hope somebody else does.
1 like
Comment

Announcement

Multiple imputation - MICE - panel data

Comment

Comment

Comment

Comment

Comment