Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to access number of changes done by command (replace)

    Dear all,

    I'm writing a code in which I have to input the contents of a string variable (var1) to observations that have empty values for var1 but that belong to the same var2 group. A short example of the dataset can be found below (my original dataset has ~5000 observations on ~2000 groups of var2).

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int var2 str19 var1
    3385 "L9"                 
    3385 ""                   
    3560 "L8"                 
    3560 ""                   
    3666 "Poor classification"
    3666 ""                   
    3560 ""                   
     889 "L1"                 
     889 ""                   
     890 "L1"                 
     893 "L1"                 
     893 ""                   
     892 "L1"                 
     892 ""                   
     892 ""                   
     892 ""                   
     892 ""                   
     892 ""                   
     891 "L1"                 
     891 ""                   
     891 ""                   
    end
    To achieve my needs, I created a code that replace one observation var1 value with the value of var1[_n+1] if var1=="", within values of var2. However, in my original dataset, since not all groups of var2 have the same amount of observations, I have to repeat the code several times in order to end up with all observations with an assigned var1. The code I use is:

    Code:
    sort var2 var1
    forvalues i = 1/10{
        by var2: replace var1 = var1[_n+1] if var1==""
    }
    In this code I set it to be repeated 10 times, even though only 5 are needed. On the original dataset, I have to run the code 67 times to do all changes needed. However, my dataset can be updated, so setting it to 67 (or any other arbitrary value) might not do the trick in the future. What I thought would be a good solution would be to get Stata to access the number of changes made on each iteration of the forvalues code, and then if that number is >0, repeat the loop. Do you think that is a good solution to the problem? Can you see another way out? Is there a way to code that in Stata?

    In any case, thanks for your help!

    Best;

  • #2
    Your approaching this the wrong way, so your code is unnecessarily complicated. In your example data, each var2 group has one and only one observation with a non-missing value of var1. So the task is just to copy this one value to all of the others. It's a one-liner:
    Code:
    by var2 (var1), sort: replace var1 = var1[_N]
    Note: This code relies on var1 being a string variable. If var1 is numeric, the sort order would put non-missings at the beginning, so you would use var1[1] instead of var1[_N].

    Comment


    • #3
      Hi Clyde, this is why Statalist is so great. A fresh set of eyes is always good. Thank you so much for your input, it does solve the issue in a much, much easier way than what I was trying. Indeed, the way the dataset is constructed, each var2 group will have one and only one observation with a non-missing value of var1. Also, var1 is a string. Thank you!

      Comment

      Working...
      X