Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Stata 17 : Loop with a "or" condition (so-called "I" in Stata) - 2.0

    Dear Statalist Users,

    I would like to write this part of code in a cleaner way too, with a loop or any other more professional way. Do you have any suggestion please?


    Code:
    gen CDI=1 if ansbef_1==2 | ansbef_2==2 | ansbef_3==2 | ansbef_4==2 | ansbef_5==2 | ansbef_6==2 | ansbef_7==2 | ansbef_8==2 | ansbef_9==2 | ansbef_10==2 
    replace CDI=0 if CDI~=1


    Thank you so much in advance!

    Best regards,

    --
    Michael Duarte Gonçalves

  • #2
    Two ways: one with the egen command:

    Code:
    egen byte CDI = anymatch(ansbef_1-ansbef_10), values(2)
    and a slightly more tedious version with just the plain generate command:

    Code:
    gen byte CDI = inlist(2,ansbef_1,ansbef_2,ansbef_3,ansbef_4,ansbef_3,ansbef_4,ansbef_5,ansbef_6,ansbef_7,ansbef_8,ansbef_9,ansbef_10)
    Last edited by Hemanshu Kumar; 03 Nov 2022, 08:04.

    Comment


    • #3
      Hello again Kumar!

      Thank you so much for your help again!
      I wish you all the best.

      Best regards.

      Michael

      Comment


      • #4
        Dear Kumar,

        I don't obtain the same "treated and untreated" depending on whether I use (1) :
        Code:
         gen byte CDI = inlist(2,ansbef_1,ansbef_2,ansbef_3,ansbef_4,ansbef_3,ansbef_4,ansbef_5,ansbef_6,ansbef_7,ansbef_8,ansbef_9,ansbef_10)
        or (2) :
        Code:
         egen byte CDI = anymatch(ansbef_1-ansbef_10), values(2)
        For (1), I obtain:

        Code:
             CDI |      Freq.     Percent        Cum.
        ------------+-----------------------------------
                  0 |     17,925       54.69       54.69
                  1 |     14,853       45.31      100.00
        ------------+-----------------------------------
              Total |     32,778      100.00
        For (2), I obtain:

        Code:
         see notes |      Freq.     Percent        Cum.
        ------------+-----------------------------------
                  0 |     11,934       36.41       36.41
                  1 |     20,844       63.59      100.00
        ------------+-----------------------------------
              Total |     32,778      100.00
        I don't know if this can have any interference, but I have some variables listed that contain missing values.

        Could you please help me? Thank you so much.

        Best regards,

        --
        Michael
        Last edited by Michael Duarte Goncalves; 03 Nov 2022, 08:41.

        Comment


        • #5
          Michael Duarte Goncalves
          ​​​​​​
          the code should work exactly the same way as long as the same variables are being selected. For instance, I ran this code for you, where I artificially generate your variables, including with missing values:

          Code:
          clear
          set obs 100
          set seed 12345
          forval i = 1/10 {
              gen ansbef_`i' = runiformint(0,2)
              replace ansbef_`i' = cond(runiformint(0,1) == 1,ansbef_`i',.)
          }
          
          egen byte CDI_1 = anymatch(ansbef_1-ansbef_10), values(2)
          gen byte CDI_2 = inlist(2,ansbef_1,ansbef_2,ansbef_3,ansbef_4,ansbef_3,ansbef_4,ansbef_5,ansbef_6,ansbef_7,ansbef_8,ansbef_9,ansbef_10)
          
          . tab CDI_1 CDI_2, miss
          
                     |         CDI_2
           see notes |         0          1 |     Total
          -----------+----------------------+----------
                   0 |        14          0 |        14
                   1 |         0         86 |        86
          -----------+----------------------+----------
               Total |        14         86 |       100
          
          . assert CDI_1 == CDI_2
          As you can see, they produce identical results. So the only reason you're getting different results is that specifying the variable list as ansbef_1-ansbef_10 is actually picking up some other variables, because in your dataset, there are one or more other variables interspersed between ansbef_1 and ansbef_10, i.e. other than the other ansbef variables that we want.

          If you just had to pick between the two ways of generating the answer, the
          inlist() method is definitely giving you the correct answer since it explicitly lists each ansbef variable. For the other method, you just have to specify the variable list in a way that it does not pick up any other variables by mistake.

          To troubleshoot for yourself, just do something like

          Code:
          des ansbef_1-ansbef_10
          and it should show you all the variables that are being picked up by this method of specifying the variable list.
          Last edited by Hemanshu Kumar; 03 Nov 2022, 08:55.

          Comment


          • #6
            Dear Kumar,

            You totally right when you said :

            As you can see, they produce identical results. So the only reason you're getting different results is that specifying the variable list as ansbef_1-ansbef_10 is actually picking up some other variables, because in your dataset, there are one or more other variables interspersed between ansbef_1 and ansbef_10, i.e. other than the other ansbef variables that we want.
            In fact, I have some variable in-between
            Code:
            ansbef_10-ansbef_10
            .

            I have recoded as :

            Code:
            egen byte cdi = anymatch(ansbef_1 ansbef_2 ansbef_3 ansbef_4 ansbef_5      ///
                                     ansbef_6 ansbef_7 ansbef_8 ansbef_9 ansbef_10),   ///
                                     values(2)
            Now, everything is perfect and both methods work optimally.

            Thank you so much again for your patience and help!

            Best wishes,

            Michael

            Comment

            Working...
            X