Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • appropriately deleting dummy 1s

    Hi,

    I got a bit of a puzzle problem for one specific application, but I feel an appropriate solution could be useful for other applications.

    I have a Dummy-variable, D1, which has, sometimes, several consecutive 1s. D1 depends on the value of var1, and takes value 1 whenever var1 is greater than 0.

    I want to generate a second dummy variable, D2, which takes value 1 every time D1 has 5 or more consecutive 1s. So, I want D2 to have value 1, at the start of a D1 succession of 1 values.

    Code:
    gen D1 = . //gen dummy var
    replace D1=1 if var1[_n]>0 // define that D1 is 1, whenever var1 is larger than 0
    replace D1=0 if D1== .
    replace D1=0 if var1==. // stata places 1s where var1=., so we need to correct
    this first part of my code creates something like this

    Code:
    D2 D1
    0    0
    0    0
    0    0
    0    0
    0    0
    1    1
    1    1
    1    1
    1    1
    1    1
    1    1
    1    1
    1    1
    1    1
    1    1
    1    1
    0    1
    0    1
    0    1
    0    1
    0    0
    0    0
    0    0
    0    0
    then, in the second part of my code I want to delete all 1s that are not at the beginning of a succession of D1 1s.

    Code:
    gen D2 = 0
    replace D2=1 if (D1[_n]+D1[_n+1] + D1ndic[_n+2] + D1[_n+3] + D1[_n+4])==5
    replace D2=0 if D2[_n-1]==1
    However, the code above creates the problem you can see below, where D2 does not just take the value 1 at the beginning of the succession.

    Code:
    D2   D1
    0    0
    0    0
    0    0
    0    0
    0    0
    1    1
    0    1
    1    1 // here D2 should have value 0
    0    1
    1    1 // here D2 should have value 0
    0    1
    1    1 // here D2 should have value 0
    0    1
    1    1 // here D2 should have value 0
    0    1
    1    1 // here D2 should have value 0
    0    1
    0    1
    0    1
    0    1
    0    0
    0    0
    0    0
    0    0

    any suggestions on how to solve this issue?

    many thanks!
    Christian


  • #2
    To start with you have some naming inconsistencies between your presented code snippets and the data you're displaying. Your first chunk of code most definitely did NOT create the data displayed in your second set of code, if only because there is no reference to D2 at all.

    You can simplify your creation of D1 from var1 with the following
    Code:
    mark D1 if var1>0 & var1<.
    
    or 
    
    gen D1=(var1>0 & var1<.)
    You also have a typo in your second snippet of code. D1ndic does not exist.

    The core of your problem, though, is that you're changing the value of D2 based on the previous value of the same variable. Stata steps sequentially through observations to perform actions. So when Stata gets to line 7 of your data D2 is initially 1, it changes it to zero because the value of D2 in observation 6 is 1. Now when Stata gets to observation 8, it looks at the value of D2 in observation 7. That value is now 0, not 1. So observation 8 remains unchanged. The easy way to fix this is to just create a new variable to use for the comparison.

    Try

    Code:
    gen dx=D2
    replace D2=0 if dx[_n-1]==1
    drop dx

    Comment


    • #3
      You may want to create a grouping variable first and then you can do whatever. Here is one way.

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input float D1
      0
      0
      0
      0
      0
      0
      1
      1
      1
      1
      1
      1
      1
      1
      1
      1
      1
      1
      1
      1
      1
      0
      0
      0
      0
      0
      1
      1
      1
      1
      0
      1
      1
      1
      1
      1
      end
      
      
      *tag start-points
      gen tag = D1==1 & D1[_n-1]==0
      
      *tag end-points
      replace tag=2 if  D1==1 & D1[_n+1]==0
      
      gen tag2= tag
      replace tag2 = tag2[_n-1] if tag2 ==0
      
      *gen group var
      gen group = D1==1 & D1[_n-1]==0
      replace group= sum(group) if group!=0
      replace tag2=0 if group!=0
      replace group= group[_n-1] if tag2==1 & _n!=1
      replace group= group[_n-1] if D1==1 & group[_n-1]!=0
      drop tag*
      
      *gen order var to restore order after manipulations
      gen order=_n

      Result:

      Code:
      . l, sepby(group)
      
           +--------------------+
           | D1   group   order |
           |--------------------|
        1. |  0       0       1 |
        2. |  0       0       2 |
        3. |  0       0       3 |
        4. |  0       0       4 |
        5. |  0       0       5 |
        6. |  0       0       6 |
           |--------------------|
        7. |  1       1       7 |
        8. |  1       1       8 |
        9. |  1       1       9 |
       10. |  1       1      10 |
       11. |  1       1      11 |
       12. |  1       1      12 |
       13. |  1       1      13 |
       14. |  1       1      14 |
       15. |  1       1      15 |
       16. |  1       1      16 |
       17. |  1       1      17 |
       18. |  1       1      18 |
       19. |  1       1      19 |
       20. |  1       1      20 |
       21. |  1       1      21 |
           |--------------------|
       22. |  0       0      22 |
       23. |  0       0      23 |
       24. |  0       0      24 |
       25. |  0       0      25 |
       26. |  0       0      26 |
           |--------------------|
       27. |  1       2      27 |
       28. |  1       2      28 |
       29. |  1       2      29 |
       30. |  1       2      30 |
           |--------------------|
       31. |  0       0      31 |
           |--------------------|
       32. |  1       3      32 |
       33. |  1       3      33 |
       34. |  1       3      34 |
       35. |  1       3      35 |
       36. |  1       3      36 |
           +--------------------+

      Comment

      Working...
      X