Replacing value with previous value if value is outside certain interval

Andreas Hohwy Pallesen

Join Date: May 2019

Posts: 2
#1

Replacing value with previous value if value is outside certain interval

21 May 2019, 08:51

Hi everyone,

I have the following panel data (in total 1200 observations)

Country Year ACC_POP ACC_RUR countrynum

Angola 2013 36.99049 16.2722368 1

Angola 2014 32 3 1

Angola 2015 42 7 1

Angola 2016 40.520607 15.9842091 1

There are obviously mistakes in the observations: ACC_RUR is able to change, but cannot change from 16.27 to 3 from 2013-2014 and to 7 from 2014-2015 and then to 15.98 in 2016. 2013 and 2016 seem to be correct.

I want to replace data if they are too far from previous or following values, for instance, how do I change ACC_RUR for 2014 and 2015 to 15.98

I have tried replacing them with the following value (in this case if the value is off by 5), using:

bysort countrynum(Year): replace EL_ACC_pRUR[_n-1]=EL_ACC_pRUR if EL_ACC_pRUR-5>EL_ACC_pRUR[_n-1]

and get the error: "weights not allowed". Fair play, weights not allowed.

Does anyone have an idea on how to correct this or run another replace code that allows for changes inside an interval (30 percent for instance)? Have in mind that this needs to be done for many observations.

A direction to another forum with the similar issue is also appreciated. Of course, I have searched for this beforehand but could not find any.

Best,
Andreas
Tags: None

William Lisowski

Join Date: Dec 2014
Posts: 10150

21 May 2019, 10:30

The problem with your syntax is that, in Stata, subscripts are not allowed on the left of the "=".

Could you be persuaded to replace the data with the previous good value rather than the next good value? The following demonstration replaces the value if it is more than 30% different from the previous value.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str6 country int year float(acc_pop acc_rur) byte countrynum
"Angola" 2013 36.99049 16.272236 1
"Angola" 2014       32         3 1
"Angola" 2015       42         7 1
"Angola" 2016 40.52061  15.98421 1
end

clonevar E = acc_rur
bysort countrynum (year): replace E = E[_n-1] if ! inrange( E, E[_n-1]*0.7, E[_n-1]*1.3 )
list, clean noobs

Code:

. list, clean noobs

    country   year    acc_pop    acc_rur   countr~m          E  
     Angola   2013   36.99049   16.27224          1   16.27224  
     Angola   2014         32          3          1   16.27224  
     Angola   2015         42          7          1   16.27224  
     Angola   2016   40.52061   15.98421          1   15.98421

An alternative approach would be to replace implausible values with a missing value and then use the ipolate command to produce interpolated replacement values

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str6 country int year float(acc_pop acc_rur) byte countrynum
"Angola" 2013 36.99049 16.272236 1
"Angola" 2014       32         3 1
"Angola" 2015       42         7 1
"Angola" 2016 40.52061  15.98421 1
"Angola" 2017 40.52061         0 1
end

clonevar E = acc_rur
replace E = . if E<10 & countrynum==1
bysort countrynum (year): ipolate E year, generate(EE) epolate
list, clean noobs

Code:

. list, clean noobs

    country   year    acc_pop    acc_rur   countr~m          E          EE  
     Angola   2013   36.99049   16.27224          1   16.27224   16.272236  
     Angola   2014         32          3          1          .   16.176227  
     Angola   2015         42          7          1          .   16.080219  
     Angola   2016   40.52061   15.98421          1   15.98421    15.98421  
     Angola   2017   40.52061          0          1          .   15.888201

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35696
#3

21 May 2019, 10:45

William Lisowski gives excellent advice as always but I guess that he too remains queasy about criteria for being too far from previous or following values. How do you propose to work if a minority of values looks good and a majority looks bad?

Last edited by Nick Cox; 21 May 2019, 11:09.
Comment
Andreas Hohwy Pallesen

Join Date: May 2019

Posts: 2
#4

22 May 2019, 07:00

William Lisowski Thank you very much! You solved my problem. I will have to work before deciding which of the two commands I should use, but they both work and I can now look at which to use.

Nick Cox I agree that a criteria for when to replace values is not ideal. Scanning the data, however, shows that a majority is good. I think setting a criteria in this case is valid as I am studying rural access to electricity. It is very unlikely to observe a country with change in their rural electricity access with less or more than 30 percent for a given year followed by another puzzling change again the following year, whereafter the value returns to a value that seems to be paired with a previous year and more likely that this is a result of measurement error. This I can see in my data. So I guess it is a case-by-case where one has to consider the variable and consider a valid interval change for that variable.
Comment

Announcement