Stata 17 : Loop with a "or" condition (so-called "I" in Stata) - 2.0

Michael Duarte Goncalves

Join Date: Oct 2022

Posts: 500
#1

Stata 17 : Loop with a "or" condition (so-called "I" in Stata) - 2.0

03 Nov 2022, 07:58

Dear Statalist Users,

I would like to write this part of code in a cleaner way too, with a loop or any other more professional way. Do you have any suggestion please?

Code:

gen CDI=1 if ansbef_1==2 | ansbef_2==2 | ansbef_3==2 | ansbef_4==2 | ansbef_5==2 | ansbef_6==2 | ansbef_7==2 | ansbef_8==2 | ansbef_9==2 | ansbef_10==2 replace CDI=0 if CDI~=1

Thank you so much in advance!

Best regards,

--
Michael Duarte Gonçalves
Tags: None
Hemanshu Kumar

Join Date: Mar 2015

Posts: 1548
#2

03 Nov 2022, 08:01

Two ways: one with the egen command:

Code:

egen byte CDI = anymatch(ansbef_1-ansbef_10), values(2)

and a slightly more tedious version with just the plain generate command:

Code:

gen byte CDI = inlist(2,ansbef_1,ansbef_2,ansbef_3,ansbef_4,ansbef_3,ansbef_4,ansbef_5,ansbef_6,ansbef_7,ansbef_8,ansbef_9,ansbef_10)

Last edited by Hemanshu Kumar; 03 Nov 2022, 08:04.
1 like
Comment
Michael Duarte Goncalves

Join Date: Oct 2022

Posts: 500
#3

03 Nov 2022, 08:05

Hello again Kumar!

Thank you so much for your help again!
I wish you all the best.

Best regards.

Michael
Comment

Michael Duarte Goncalves

Join Date: Oct 2022
Posts: 500

03 Nov 2022, 08:37

Dear Kumar,

I don't obtain the same "treated and untreated" depending on whether I use (1) :

Code:

 gen byte CDI = inlist(2,ansbef_1,ansbef_2,ansbef_3,ansbef_4,ansbef_3,ansbef_4,ansbef_5,ansbef_6,ansbef_7,ansbef_8,ansbef_9,ansbef_10)

or (2) :

Code:

 egen byte CDI = anymatch(ansbef_1-ansbef_10), values(2)

For (1), I obtain:

Code:

     CDI |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |     17,925       54.69       54.69
          1 |     14,853       45.31      100.00
------------+-----------------------------------
      Total |     32,778      100.00

For (2), I obtain:

Code:

 see notes |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |     11,934       36.41       36.41
          1 |     20,844       63.59      100.00
------------+-----------------------------------
      Total |     32,778      100.00

I don't know if this can have any interference, but I have some variables listed that contain missing values.

Could you please help me? Thank you so much.

Best regards,

--
Michael

Last edited by Michael Duarte Goncalves; 03 Nov 2022, 08:41.

Comment

Hemanshu Kumar

Join Date: Mar 2015

Posts: 1548
#5

03 Nov 2022, 08:51

Michael Duarte Goncalves

the code should work exactly the same way as long as the same variables are being selected. For instance, I ran this code for you, where I artificially generate your variables, including with missing values:

Code:

clear set obs 100 set seed 12345 forval i = 1/10 { gen ansbef_`i' = runiformint(0,2) replace ansbef_`i' = cond(runiformint(0,1) == 1,ansbef_`i',.) } egen byte CDI_1 = anymatch(ansbef_1-ansbef_10), values(2) gen byte CDI_2 = inlist(2,ansbef_1,ansbef_2,ansbef_3,ansbef_4,ansbef_3,ansbef_4,ansbef_5,ansbef_6,ansbef_7,ansbef_8,ansbef_9,ansbef_10) . tab CDI_1 CDI_2, miss | CDI_2 see notes | 0 1 | Total -----------+----------------------+---------- 0 | 14 0 | 14 1 | 0 86 | 86 -----------+----------------------+---------- Total | 14 86 | 100 . assert CDI_1 == CDI_2

As you can see, they produce identical results. So the only reason you're getting different results is that specifying the variable list as ansbef_1-ansbef_10 is actually picking up some other variables, because in your dataset, there are one or more other variables interspersed between ansbef_1 and ansbef_10, i.e. other than the other ansbef variables that we want.

If you just had to pick between the two ways of generating the answer, the inlist() method is definitely giving you the correct answer since it explicitly lists each ansbef variable. For the other method, you just have to specify the variable list in a way that it does not pick up any other variables by mistake.

To troubleshoot for yourself, just do something like

Code:

des ansbef_1-ansbef_10

and it should show you all the variables that are being picked up by this method of specifying the variable list.

Last edited by Hemanshu Kumar; 03 Nov 2022, 08:55.
Comment
Michael Duarte Goncalves

Join Date: Oct 2022

Posts: 500
#6

03 Nov 2022, 09:15

Dear Kumar,

You totally right when you said :

As you can see, they produce identical results. So the only reason you're getting different results is that specifying the variable list as ansbef_1-ansbef_10 is actually picking up some other variables, because in your dataset, there are one or more other variables interspersed between ansbef_1 and ansbef_10, i.e. other than the other ansbef variables that we want.

In fact, I have some variable in-between

Code:

ansbef_10-ansbef_10

.

I have recoded as :

Code:

egen byte cdi = anymatch(ansbef_1 ansbef_2 ansbef_3 ansbef_4 ansbef_5 /// ansbef_6 ansbef_7 ansbef_8 ansbef_9 ansbef_10), /// values(2)

Now, everything is perfect and both methods work optimally.

Thank you so much again for your patience and help!

Best wishes,

Michael
Comment

Announcement

Stata 17 : Loop with a "or" condition (so-called "I" in Stata) - 2.0

Comment

Comment

Comment

Comment

Comment