Replacing missing values by matching them with a different observation

Duccio Milani

Join Date: Oct 2021

Posts: 23
#1

Replacing missing values by matching them with a different observation

28 Oct 2021, 03:39

Dear all,

this is my first post and I have recently been using STATA for operations beyond the basic course offered by the university. I apologize if the formatting of the question is not correct and if my vocabulary is inaccurate.

What I would like to do is to replace missing values for some observations by taking them from other observations with which they share the first digits of the string.
The values I want to copy are coming from eqi2017 variable and the "shared" variable is NUTS2.
The observations I report here are in the same dataset.

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input str4 NUTS2 double eqi2017 "DE11" . "DE12" . "DE13" . "DE14" . "DE21" . "DE22" . "DE23" . "DE24" . "DE25" . "DE26" . "DE27" . end

[...]

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input str4 NUTS2 double eqi2017 "DE1" 1.076 "DE2" 1.343 end

As you can notice in the first sample of the dataset I posted, some observations share the first 3 digits of the NUTS2 variable and they have empty values for the eqi2017 variable. I want to replace the empty values taking them from the three-digit observations you can find in the second sample of the dataset I posted.

So far I've been using this inefficient and bulky code:

Code:

replace eqi2017=1.076 if strpos(NUTS2 , "DE1")!=0 replace eqi2017=1.343 if strpos(NUTS2 , "DE2")!=0

This is the output

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input str4 NUTS2 double eqi2017 "DE11" 1.076 "DE12" 1.076 "DE13" 1.076 "DE14" 1.076 "DE21" 1.343 "DE22" 1.343 "DE23" 1.343 "DE24" 1.343 "DE25" 1.343 "DE26" 1.343 "DE27" 1.343 end

It works but since I've around 100 observations to fill I am looking for a more compact and efficient manner to deal with this.

Thank you all, I hope the question is understandable and the post is properly structured.
Tags: None

Fei Wang

Join Date: Oct 2021
Posts: 726

28 Oct 2021, 04:25

Duccio, missing values can be replaced by merging the two data.

Code:

use data1, clear

gen NUTS1 = NUTS2
drop eqi2017
replace NUTS2 = substr(NUTS2,1,3)

merge m:1 NUTS2 using data2, nogen

replace NUTS2 = NUTS1
drop NUTS1

Comment

Duccio Milani

Join Date: Oct 2021

Posts: 23
#3

28 Oct 2021, 06:45

Dear Fei,
Thank you for your reply.

The solution you provided works but I forgot to mention in the initial question that there are observations that already have a value in eqi2017 and with the code you provided are deleted.

I am trying to solve this problem by redoing a merge with the dataset for the data that were complete.

Would you have an idea to overcome this problem by modifying the code you sent me earlier?
Comment
Duccio Milani

Join Date: Oct 2021

Posts: 23
#4

28 Oct 2021, 07:54

Update: I managed to have now a complete dataset by dividing the dataset and merging with the update option. However, I think that my solution it's not the most elegant so I'd be happy to read other suggestions.

Thanks again Fei
Comment

Fei Wang

Join Date: Oct 2021
Posts: 726

28 Oct 2021, 09:00

Duccio, I updated the code and it should work now.

Code:

use data1, clear

gen NUTS1 = NUTS2
ren eqi2017 eqi
replace NUTS2 = substr(NUTS2,1,3)

merge m:1 NUTS2 using data2, nogen

replace eqi = eqi2017 if eqi == .
replace NUTS2 = NUTS1
drop NUTS1 eqi2017
ren eqi eqi2017

Comment

Duccio Milani

Join Date: Oct 2021

Posts: 23
#6

01 Nov 2021, 12:02

It works perfectly. Thanks!
Comment

Announcement

Replacing missing values by matching them with a different observation

Comment

Comment

Comment

Comment

Comment