Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Replacing value with previous value if value is outside certain interval

    Hi everyone,

    I have the following panel data (in total 1200 observations)
    Country Year ACC_POP ACC_RUR countrynum
    Angola 2013 36.99049 16.2722368 1
    Angola 2014 32 3 1
    Angola 2015 42 7 1
    Angola 2016 40.520607 15.9842091 1

    There are obviously mistakes in the observations: ACC_RUR is able to change, but cannot change from 16.27 to 3 from 2013-2014 and to 7 from 2014-2015 and then to 15.98 in 2016. 2013 and 2016 seem to be correct.

    I want to replace data if they are too far from previous or following values, for instance, how do I change ACC_RUR for 2014 and 2015 to 15.98

    I have tried replacing them with the following value (in this case if the value is off by 5), using:

    bysort countrynum(Year): replace EL_ACC_pRUR[_n-1]=EL_ACC_pRUR if EL_ACC_pRUR-5>EL_ACC_pRUR[_n-1]

    and get the error: "weights not allowed". Fair play, weights not allowed.

    Does anyone have an idea on how to correct this or run another replace code that allows for changes inside an interval (30 percent for instance)? Have in mind that this needs to be done for many observations.

    A direction to another forum with the similar issue is also appreciated. Of course, I have searched for this beforehand but could not find any.

    Best,
    Andreas

  • #2
    The problem with your syntax is that, in Stata, subscripts are not allowed on the left of the "=".

    Could you be persuaded to replace the data with the previous good value rather than the next good value? The following demonstration replaces the value if it is more than 30% different from the previous value.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str6 country int year float(acc_pop acc_rur) byte countrynum
    "Angola" 2013 36.99049 16.272236 1
    "Angola" 2014       32         3 1
    "Angola" 2015       42         7 1
    "Angola" 2016 40.52061  15.98421 1
    end
    
    clonevar E = acc_rur
    bysort countrynum (year): replace E = E[_n-1] if ! inrange( E, E[_n-1]*0.7, E[_n-1]*1.3 )
    list, clean noobs
    Code:
    . list, clean noobs
    
        country   year    acc_pop    acc_rur   countr~m          E  
         Angola   2013   36.99049   16.27224          1   16.27224  
         Angola   2014         32          3          1   16.27224  
         Angola   2015         42          7          1   16.27224  
         Angola   2016   40.52061   15.98421          1   15.98421
    An alternative approach would be to replace implausible values with a missing value and then use the ipolate command to produce interpolated replacement values
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str6 country int year float(acc_pop acc_rur) byte countrynum
    "Angola" 2013 36.99049 16.272236 1
    "Angola" 2014       32         3 1
    "Angola" 2015       42         7 1
    "Angola" 2016 40.52061  15.98421 1
    "Angola" 2017 40.52061         0 1
    end
    
    clonevar E = acc_rur
    replace E = . if E<10 & countrynum==1
    bysort countrynum (year): ipolate E year, generate(EE) epolate
    list, clean noobs
    Code:
    . list, clean noobs
    
        country   year    acc_pop    acc_rur   countr~m          E          EE  
         Angola   2013   36.99049   16.27224          1   16.27224   16.272236  
         Angola   2014         32          3          1          .   16.176227  
         Angola   2015         42          7          1          .   16.080219  
         Angola   2016   40.52061   15.98421          1   15.98421    15.98421  
         Angola   2017   40.52061          0          1          .   15.888201


    Comment


    • #3
      William Lisowski gives excellent advice as always but I guess that he too remains queasy about criteria for being too far from previous or following values. How do you propose to work if a minority of values looks good and a majority looks bad?
      Last edited by Nick Cox; 21 May 2019, 11:09.

      Comment


      • #4
        William Lisowski Thank you very much! You solved my problem. I will have to work before deciding which of the two commands I should use, but they both work and I can now look at which to use.

        Nick Cox I agree that a criteria for when to replace values is not ideal. Scanning the data, however, shows that a majority is good. I think setting a criteria in this case is valid as I am studying rural access to electricity. It is very unlikely to observe a country with change in their rural electricity access with less or more than 30 percent for a given year followed by another puzzling change again the following year, whereafter the value returns to a value that seems to be paired with a previous year and more likely that this is a result of measurement error. This I can see in my data. So I guess it is a case-by-case where one has to consider the variable and consider a valid interval change for that variable.

        Comment

        Working...
        X