Replacing missing values based on another column

Merik Manzano

Join Date: Aug 2022

Posts: 1
#1

Replacing missing values based on another column

15 Aug 2022, 18:35

Hello! My problem seems simple but I'm still new and unsure how to proceed. Essentially, I have a dataset with observations for people over time, thus there are multiple observations for a particular person (a particular fullname entry). They currently are all coded as having missing values for the columns gender and genderprobability. Appended to this data are observations as shown below, where many (though not all) of the unique names in the aforementioned dataset are assigned a gender and a probability in those columns.

For all observations in the actual dataset, I would like to replace the (currently missing values) for gender and genderprobability with the values given in the appended set. I.E. I would like to copy "male" into the gender column and ".99" into the genderprobability column for all observations where the fullname variable = "Michael Bishop", and replicate this for all names in the dataset that I have a gender and genderprobability for.

Code:

* Example generated by -dataex-. For more info, type help dataex clear input str44 fullname str7 gender float genderprobability "Michael Bishop" "male" .99 "Amera Barker" "female" .99 "Chris Atkinson" "male" .98 end

Thank you for any help!
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30355
#2

15 Aug 2022, 18:49

It is unclear from your description whether you have two data sets (one with missing values, and the other with non-missing values) or one where some of the observations have missing values and others bearing the same values of fullname have non-missing values. The approach differs depending on which it is.

If you have two different data sets, what you want is only possible assuming the data set without missing values has only one observation for each fullname:

Code:

use data_set_with_missing_values, clear merge m:1 fullname using data_set_without_missing_values

If you have a single data set, what you want is only sensible if for each fullname there is only a single observation with non-missing values. In this case:

Code:

by fullname (gender), sort: replace gender = gender[_N] by fullname (genderprobability), sort: replace genderprobability = genderprobability[1]

Thank you for using -dataex- in your very first post!
Comment

Announcement

Replacing missing values based on another column

Comment