Filling Missing Values with duplicates, including strings and numeric variables

Ramiro Doisquilhos

Join Date: Jun 2018

Posts: 1
#1

Filling Missing Values with duplicates, including strings and numeric variables

27 Jun 2018, 08:55

There have been quite a few posts about filling but none with the nature of my problem.

I have data which has duplicates according to two id variables, call it id1 id2. These duplicates are separate information of the same observations, thus, if we use the command duplicate tabulate and use the id1 id2 and, let's call it id3, the duplicates disappear.

This last variable, id3, is the source of information. Thus, if the source differs, apart from the id variables, some observations are blank.

What I need is to merge these duplicates. So if they have the same id1 and id2, then I should fill the blanks with the values from one of the duplicates, no matter which (these are static variables), so then I can drop the remaining duplicates and also id3.

I cannot post examples of the data.

I was thinking of using for example max of the group with id1 and id2 to fill, but the only solutions were to use the previous, next, first, last,...

Moreover the technique of looping over all variables and generating temps using for example 'egen max(...)'does not work as I have string variables, for example, the ids themselves.
Tags: data management, datasets, duplicates, filling, missing variables
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17703
#2

29 Jun 2018, 23:52

Ramiro:
welcome to this forum.
The scant number of posts on filling missing data with last, next or other values, is probably due to the fact that filling is not the first choice method to deal with missing data (see -mi-, for instance).
That said, even if your data are confidential, you can post a fake excerpt/example of your dataset (see -help dataex-): this would help interested listers to help you out in turn. Thanks.

Kind regards,
Carlo
(Stata 19.0)
Comment

Announcement

Filling Missing Values with duplicates, including strings and numeric variables

Comment