Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Replacing missing values by matching them with a different observation

    Dear all,

    this is my first post and I have recently been using STATA for operations beyond the basic course offered by the university. I apologize if the formatting of the question is not correct and if my vocabulary is inaccurate.

    What I would like to do is to replace missing values for some observations by taking them from other observations with which they share the first digits of the string.
    The values I want to copy are coming from eqi2017 variable and the "shared" variable is NUTS2.
    The observations I report here are in the same dataset.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str4 NUTS2 double eqi2017
    "DE11" .
    "DE12" .
    "DE13" .
    "DE14" .
    "DE21" .
    "DE22" .
    "DE23" .
    "DE24" .
    "DE25" .
    "DE26" .
    "DE27" .
    end
    [...]
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str4 NUTS2 double eqi2017
    "DE1" 1.076
    "DE2" 1.343
    end
    As you can notice in the first sample of the dataset I posted, some observations share the first 3 digits of the NUTS2 variable and they have empty values for the eqi2017 variable. I want to replace the empty values taking them from the three-digit observations you can find in the second sample of the dataset I posted.

    So far I've been using this inefficient and bulky code:
    Code:
    replace eqi2017=1.076 if strpos(NUTS2 , "DE1")!=0
    replace eqi2017=1.343 if strpos(NUTS2 , "DE2")!=0
    This is the output
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str4 NUTS2 double eqi2017
    "DE11" 1.076
    "DE12" 1.076
    "DE13" 1.076
    "DE14" 1.076
    "DE21" 1.343
    "DE22" 1.343
    "DE23" 1.343
    "DE24" 1.343
    "DE25" 1.343
    "DE26" 1.343
    "DE27" 1.343
    end


    It works but since I've around 100 observations to fill I am looking for a more compact and efficient manner to deal with this.

    Thank you all, I hope the question is understandable and the post is properly structured.




  • #2
    Duccio, missing values can be replaced by merging the two data.

    Code:
    use data1, clear
    
    gen NUTS1 = NUTS2
    drop eqi2017
    replace NUTS2 = substr(NUTS2,1,3)
    
    merge m:1 NUTS2 using data2, nogen
    
    replace NUTS2 = NUTS1
    drop NUTS1

    Comment


    • #3
      Dear Fei,
      Thank you for your reply.

      The solution you provided works but I forgot to mention in the initial question that there are observations that already have a value in eqi2017 and with the code you provided are deleted.

      I am trying to solve this problem by redoing a merge with the dataset for the data that were complete.

      Would you have an idea to overcome this problem by modifying the code you sent me earlier?

      Comment


      • #4
        Update: I managed to have now a complete dataset by dividing the dataset and merging with the update option. However, I think that my solution it's not the most elegant so I'd be happy to read other suggestions.

        Thanks again Fei

        Comment


        • #5
          Duccio, I updated the code and it should work now.

          Code:
          use data1, clear
          
          gen NUTS1 = NUTS2
          ren eqi2017 eqi
          replace NUTS2 = substr(NUTS2,1,3)
          
          merge m:1 NUTS2 using data2, nogen
          
          replace eqi = eqi2017 if eqi == .
          replace NUTS2 = NUTS1
          drop NUTS1 eqi2017
          ren eqi eqi2017

          Comment


          • #6
            It works perfectly. Thanks!

            Comment

            Working...
            X