Missing values treatment in panel data: Ipolate Vs mi (??)

Andreas Georgantopoulos

Join Date: Apr 2017

Posts: 8
#1

Missing values treatment in panel data: Ipolate Vs mi (??)

25 Apr 2017, 06:25

Dear all,

I am using the STATA 13 software. I am quite confused about the appropriateness of the "ipolate" command and the multiple imputation technique when dealing with data in panel form.

e.g. I have a variable namely, Return on Assets (ROAA) for a one-country panel sample with yearly obs. (thus my panel variable is COMPANY, i.e. company's name). The 15% of my ROAA observations are missing.

First, I used the following "ipolate command" to fill-in the missing obs.: ipolate ROAA YEAR, gen(newv) epolate by (COMPANY)

My questions are the following:

1) Which method is more appropriate when dealing with missing obs in panel data? The above indicative “ipolate command” or to use multiple imputation (linear)?

2) Let's assume that I use the multiple imputation technique; How can I see the imputed obs. produced, and how can I incorporate the imputed obs. in my original data?

I would much appreciate your advice/comments.

Thank you in advance!
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 36058
#2

25 Apr 2017, 06:35

It's like asking which is the more interesting sport or the better book? On what criteria and in whose view?

Interpolation is deteministic and considers only previous and following values in sequence. (There is really is no need to restrict yourself to linear interpolation if interpolation is of interest.)

Multiple imputation is not so neatly characterised. But it doesn't purport to provide a single set of imputed values. That's most of the point as I understand it.

I am not sure that you will get many takers explaining multiple imputation when there is an entire Stata manual already doing precisely that. Regardless, I know some things about interpolation and am no authority on imputation, so bail out on any later questions on the latter.
Comment
Andreas Georgantopoulos

Join Date: Apr 2017

Posts: 8
#3

25 Apr 2017, 08:53

Thank you for your comment. Any suggestion on what is the most appropriate method - per your opinion to address missing variables in panel data?

Because, in order to test the aforementioned methods, I made a simple test; I simply excluded few realised obs. from my dataset on ROAA and I run the ipolate command - as indicated above -while I also run multiple imputation (for this case I used as independent variables full set of obs. for the case of total assets and capital adequacy with no missing values.). In some cases I found significant deviation from the "real" values which had been intentionally eliminated for the purposes of this test.

Therefore, I would much appreciate - with your criteria, experience and your view in this case - any suggestion on the most trustworthy/accurate method to fill in missing values that do not exceed 20% of the overall data employed (in panel form).
Comment
Nick Cox

Join Date: Mar 2014

Posts: 36058
#4

25 Apr 2017, 09:19

(This is just to repeat that I will not expatiate on imputation. I really have no experience in it, just second-hand opinions.)
1 like
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17854
#5

25 Apr 2017, 10:35

Andreas:
as an aside to Nick's wise remarks about -ipolate- and -mi-, if you're also interested in the latter, I would suggest to consider: https://www.crcpress.com/Flexible-Im.../9781439868249
As usual, a first step would be to investigate whether the missingness of your data is informative or not.

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
daniel klein

Join Date: Mar 2014

Posts: 3912
#6

25 Apr 2017, 10:58

Just to add to the Nick's point of multiple imputation. Any method that imputes a single value might give you valid point estimates (depending assumptions about the missing data) but will not give you correct standard errors. The problem with single imputed values is that they do not in any way reflect the uncertainty associated with them. We do not know what the "true" value is and it is not even our goal to recreate it. The point is to take multiple educated guesses and come up with a variety of plausible values. We can then use the differences between these values to reflect our uncertainty about the "true" value. This is how we get valid standard errors and confidence intervals.

Best
Daniel
1 like
Comment
Andreas Georgantopoulos

Join Date: Apr 2017

Posts: 8
#7

26 Apr 2017, 01:36

Thank you all for your time and useful comments. I really appreciate it.

Just to contribute to this discussion, my experience so far from this test i.e. attempting to fill in "intentionally" missing observations brought as a..."winner" linear interpolation over multiple imputation (linear). The first method managed to provide satisfactory guesses - small standard errors - in 3 out of 5 missing values. Considering daniel klein comment, I have to say that the failure of the ipolate command to produce good guesses in 2/5 missing obs. is focused in cases where high risk/uncertainty is traced.

Thus, my conclusion so far is that in the case of low volatility linear interpolation produced better results than linear multiple imputation (although in the case of mi I experimented with a number of independent variables). I have to say that I was surprised from this outcome and this is the main reason that I wanted to raise this issue here.

To conclude, the dillemma as I see it is should one "stick" to a weakly unbalanced panel data sample or should one try to produce imputed outcomes in order to create a strongly balanced panel? Thus, does it "worth taking the risk" to fill in missing data in favor of having to deal with a strongly balanced panel? If yes, taking "D.ROAA" to reflect the uncertaintly won't sacrifice useful information?

Kind regards,
Andreas
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17854
#8

26 Apr 2017, 02:08

Andreas:
in general, I would stick with unbalanced panel data (unless I'm strongly confident that panel attrition is informative), knowing that Stata can handle both balanced and unbalanced panel datasets without any problem.

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
daniel klein

Join Date: Mar 2014

Posts: 3912
#9

26 Apr 2017, 03:33

Originally posted by Andreas Georgantopoulos View Post

my experience so far from this test i.e. attempting to fill in "intentionally" missing observations brought as a..."winner" linear interpolation over multiple imputation (linear). The first method managed to provide satisfactory guesses - small standard errors - in 3 out of 5 missing values.

But that was exactly my point. The standard errors are (too) small, because they do not take the uncertainty associated with the imputation process into account. I would prefer arguably more appropriate larger standard errors from MI over downward biased standard errors from interpolation methods.

Having said this, I failed to realize that it is the outcome that is missing. Imputing the outcome is not necessarily a good idea as Carlo points out. In the case of MI your imputation model should in this case be larger than the substantive model.

Best
Daniel
Comment
Andreas Georgantopoulos

Join Date: Apr 2017

Posts: 8
#10

26 Apr 2017, 04:24

Dear Daniel, maybe I did not express my thoughts very clearly. My imputation model was indeed larger than my original model. I just said - following your comment above, with which I agree - that the "success rate" of the linear interpolation - at least in my test - provided more trustworthy outcomes.

So, if I have understood correctly both Carlo and you suggest that I should avoid "filling in any gaps" in order to convert a weakly unbalanced panel sample into a strongly balanced sample?

Thank you again.

Best,
Andreas
Comment
daniel klein

Join Date: Mar 2014

Posts: 3912
#11

26 Apr 2017, 04:35

Originally posted by Andreas Georgantopoulos View Post

the "success rate" of the linear interpolation - at least in my test - provided more trustworthy outcomes.

How do you define this? What is "success"? What is "trustworthy"?

Best
Daniel
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17854
#12

26 Apr 2017, 04:35

Andreas:
(I can only answer for myself): yes, go unbalanced.

PS: Crossed with Daniel's reply, that addresses two relevant topics.

Kind regards,
Carlo
(Stata 19.0)
Comment
Johnny Tad

Join Date: Nov 2017

Posts: 41
#13

18 Mar 2018, 20:59

How do i interpolate once for all variables. Also, my interpolation generates new variables, giving me an additional task of dropping the old variables. How do i make sure it just fills in the missing variables without having to create new ones? Or is there a way of deleting all old variables at once?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 36058
#14

19 Mar 2018, 03:18

#13 is better followed at https://www.statalist.org/forums/for...-interpolation
Comment

Announcement

Missing values treatment in panel data: Ipolate Vs mi (??)

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment