Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Changing data based on condition

    Hi,

    I am using Stata 15 on Windows 10 OS. I have data collected in several rounds as demonstrated in the example below

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte(id round var1 var2) str3 var3 int var4 str3 var5
    1 1 1 25 "Yes" 2500 "Yes"
    1 2 1 23 "Yes" 2500 "Yes"
    2 1 1 60 "Yes" 1000 "No" 
    2 2 2 60 "No"  1000 "No" 
    3 1 2 75 "Yes" 3500 "Yes"
    3 2 2 75 "Yes" 3500 "No" 
    end
    The goal is to modify any discrepancy in earlier round in cases where there's discrepancy between rounds. For instance, for id 1 change var2 in round 1 to 23, id 2 change var1 and var3 to 2 and No respectively. I would appreciate a simple way out in achieving this task.

    Thanks in advance!

    Best,
    Stephen.

  • #2
    Why is 23 considered correct and not 25?

    Comment


    • #3
      Thanks Nick Cox for your question.

      Information collected in later rounds is considered more accurate.

      Comment


      • #4
        So on that logic you should just analyse data for round 1.

        Comment


        • #5
          Thanks Nick. Sorry my example didn't exhaustively address all the scenarios. There are some missing cases too.

          Comment


          • #6
            I naturally believe you that much more can be said but what you have said so far is that data in round 2 that differ from those in round 1 should be changed to the latter. So, there is, on this view, no information added by round 2. Hence, there is no point to changing the data in round 2 as it either is the same or should be considered wrong.

            Comment


            • #7
              Thanks Nick for your thoughts. Much appreciated.

              Comment


              • #8
                While Nick addresses the important question, why do this, it is not hard to do what you originally said you'd like to do with a bunch of if conditions. If need be, you can loop over rounds starting at the last round and working to the first replacing values where needed. This can be done either by xtset your data and using lags/leads or by using x[_n-1] along with a condition so you don't lag or lead across ids.

                Comment


                • #9
                  Thanks Phil Bromiley for your thoughts. Is it possible to give sn example based on the data I shared?
                  Thanks in advance!

                  Comment


                  • #10
                    Stephen, I have the same concern as others - changing everything to match what is in 1st round means you could essentially delete all the later rounds (no new info is being added). But to show you how to do it:

                    You may also find the following posts (here and here) helpful.

                    Code:
                    dataex id round var*
                    clear
                    input byte(id round var1 var2) str3 var3 int var4 str3 var5
                    1 1 1 25 "Yes" 2500 "Yes"
                    1 2 1 23 "Yes" 2500 "Yes"
                    2 1 1 60 "Yes" 1000 "No"
                    2 2 2 60 "No"  1000 "No"
                    3 1 2 75 "Yes" 3500 "Yes"
                    3 2 2 75 "Yes" 3500 "No"
                    end
                    
                    . list, noobs sepby(id)
                    
                      +-----------------------------------------------+
                      | id   round   var1   var2   var3   var4   var5 |
                      |-----------------------------------------------|
                      |  1       1      1     25    Yes   2500    Yes |
                      |  1       2      1     23    Yes   2500    Yes |
                      |-----------------------------------------------|
                      |  2       1      1     60    Yes   1000     No |
                      |  2       2      2     60     No   1000     No |
                      |-----------------------------------------------|
                      |  3       1      2     75    Yes   3500    Yes |
                      |  3       2      2     75    Yes   3500     No |
                      +-----------------------------------------------+
                    
                    // Replaces each value with the value in the 1st observation by id
                    // Looping over the variables (assuming vars are named var1, var2, var3, etc)
                    forvalues i=1/5 {
                        bysort id (round): replace var`i' = var`i'[1] if !missing(var`i'[1])
                    }
                    
                    // NOTE: If you want to use the x[_n-1] method that Phil mentions in post#8
                    bysort id (round): replace var1= var1[_n-1] if !missing(var1[_n-1])  // obviously, you don't want to replace if the prior value is missing
                    
                    
                    . list, noobs sepby(id)
                    
                      +-----------------------------------------------+
                      | id   round   var1   var2   var3   var4   var5 |
                      |-----------------------------------------------|
                      |  1       1      1     25    Yes   2500    Yes |
                      |  1       2      1     25    Yes   2500    Yes |
                      |-----------------------------------------------|
                      |  2       1      1     60    Yes   1000     No |
                      |  2       2      1     60    Yes   1000     No |
                      |-----------------------------------------------|
                      |  3       1      2     75    Yes   3500    Yes |
                      |  3       2      2     75    Yes   3500    Yes |
                      +-----------------------------------------------+
                    
                    
                    egen total_rounds = max(round), by(id)  // number of rounds firm had
                    egen total_raised = total(var4), by(id)  // total amount of funds raised (by id)
                    
                    list, noobs sepby(id) abbrev(12)
                    
                      +-----------------------------------------------------------------------------+
                      | id   round   var1   var2   var3   var4   var5   total_rounds   total_raised |
                      |-----------------------------------------------------------------------------|
                      |  1       1      1     25    Yes   2500    Yes              2           5000 |
                      |  1       2      1     25    Yes   2500    Yes              2           5000 |
                      |-----------------------------------------------------------------------------|
                      |  2       1      1     60    Yes   1000     No              2           2000 |
                      |  2       2      1     60    Yes   1000     No              2           2000 |
                      |-----------------------------------------------------------------------------|
                      |  3       1      2     75    Yes   3500    Yes              2           7000 |
                      |  3       2      2     75    Yes   3500    Yes              2           7000 |
                      +-----------------------------------------------------------------------------+
                    
                    .

                    Comment


                    • #11
                      Thanks David for your insights. This is really appreciated.

                      Comment

                      Working...
                      X