Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to replace any variable's value if meeting certain condition

    Dear Stata users,

    In the following data, I want to replace variables d2a1 d2a2 d2a3 to missing 'cause variable d2a <100. Now I must type as many command lines as certain conditions.

    Code:
    replace d2a1=. if d2a==29
    replace d2a1=. if d2a==8
    replace d2a2=. if d2a==50
    replace d2a3=. if d3a==50
    ......
    Or:
    Code:
    replace d2a1=. if d2a<100
    replace d2a2=. if d2a<100
    replace d2a3=. if d3a<100
    How to achieve that in a smarter way? (one command line and not so many "if", for example) Thank you.

    Code:
        d2a1      d2a2      d2a3          d2a4        d2a
          29         0         0          2500         29
           8         0         0             0          8
           0        50         0         20000         50
          50         0         1          5000         50
           0         1         0         30000          1
           0        21         0         30000         21
          50         0         7          5000         57
          10        50         0         20000         60
          29         0         1          2500         30

  • #2
    I do not see any obvious pattern in your first example.

    The second example can be wrapped into a loop

    Code:
    foreach var in d2a1 d2a2 d2a3 {
        replace `var' = . if d2a < 100
    }
    or

    Code:
    forvalues i = 1/3 {
        replace d2a`i' = . if d2a < 100
    }
    Last edited by daniel klein; 23 Oct 2021, 00:24. Reason: second loop is not really "less readable" as I have claimed

    Comment


    • #3
      There are some typo in 1#, d3a for example. I have used loop before opening this thread but expect a smarter way. Maybe loop is smart enough. Thank you daniel klein.
      Last edited by Chen Samulsion; 23 Oct 2021, 01:08.

      Comment


      • #4
        Originally posted by Chen Samulsion View Post
        I have used loop before opening this thread but expect a smarter way.
        How do you define "smart" here? If you plan on doing these things more often, define a program

        Code:
        program replace_varlist
            version 16.1
            syntax varlist =exp [ if ] [ in ] [ , *]
            foreach var of local varlist {
                `var' `exp' `if' `in' , `options'
            }
        end
        that you then call as

        Code:
        replace_varlist d2a1 d2a2 ...
        Obviously and depending on details, d2a1, d2a2, ... may be referred to as d2a*?.

        Comment


        • #5
          Thank you very much daniel. Well, in my data, d2a1+d2a2+d2a3 should be equal to 100 (d2a, the sumup variable). If d2a !=100, suggesting that observations are recorded wrongly and should be set to missing. Previously, I prepared to replace the trouble-maker variable (d2a1 in 18th observation, for example) to missing. So I want to find a "smart" way that help me finding the trouble-maker variable in each observation. Now, I will replace all three variables to missing if the sumup variable is not equal to 100, so the "smart" way seems not be badly in need of.

          Code:
          . foreach v in d2a d2b d2c d2d d2e {
            2.  list `v'1 `v'2 `v'3 `v' if `v'<100 & `v'!=0
            3.  }
          
                +--------------------------+
                | d2a1   d2a2   d2a3   d2a |
                |--------------------------|
            18. |   29      0      0    29 |
           564. |    8      0      0     8 |
          1150. |    0     50      0    50 |
          1256. |   50      0      0    50 |
          1426. |    0      1      0     1 |
                +--------------------------+
          
                +--------------------------+
                | d2b1   d2b2   d2b3   d2b |
                |--------------------------|
            18. |   71      0      0    71 |
          1150. |    0     50      0    50 |
          1256. |   50      0      0    50 |
                +--------------------------+
          
                +--------------------------+
                | d2c1   d2c2   d2c3   d2c |
                |--------------------------|
           214. |   33     33     33    99 |
          1214. |    0      1      0     1 |
          1697. |    0      5      0     5 |
                +--------------------------+
          
                +--------------------------+
                | d2d1   d2d2   d2d3   d2d |
                |--------------------------|
          1214. |    0      1      0     1 |
                +--------------------------+
          
                +--------------------------+
                | d2e1   d2e2   d2e3   d2e |
                |--------------------------|
           736. |    1      0      0     1 |
          1214. |    0      3      0     3 |
          1310. |    8      0      0     8 |
          1465. |    1      1      1     3 |
          1697. |    1      0      0     1 |
                |--------------------------|
          1874. |    0      0      8     8 |
                +--------------------------+

          Comment


          • #6
            Originally posted by Chen Samulsion View Post
            Well, in my data, d2a1+d2a2+d2a3 should be equal to 100 (d2a, the sumup variable). If d2a !=100, suggesting that observations are recorded wrongly and should be set to missing.
            It could also suggest that correctly recorded variables have been added up incorrectly.

            Originally posted by Chen Samulsion View Post
            [...] So I want to find a "smart" way that help me finding the trouble-maker variable in each observation.
            How do you know (i) how many "trouble-maker" variables there are in a given observation, and (ii) which of potentially many variables is the "trouble-maker"? I think you ask for the impossible. However, I obviously have no background information whatsoever, to be sure.

            Comment


            • #7
              The added up variable d2a is generated manually by code: egen d2a=rowtotal(d2a1 d2a2 d2a3), so it is no possible of incorrectness.
              To know how many "trouble-maker" variables there are in a given observation is impossible, so I will replace variables to missing whenever their corresponding added up variable is not equals to 100.
              This thread can be closed by now.

              Comment

              Working...
              X