Specifying multiple criteria for -foreach-

Michael McCulloch

Join Date: Jun 2025

Posts: 24
#1

Specifying multiple criteria for -foreach-

23 Aug 2016, 18:39

Hello, I'm working on building a -foreach- statement in hopes of making more efficient the search criteria for a -replace- command. Below is my code, in which I examine the variable plicd for specific values, comparing two methods:
1. Using -foreach- to make variable tagfor ==1 whenever the search criteria are satisfied. And,
2. Using -replace- to make variable tagman==1 whenever the search criteria are satisfied.

For my -foreach- code, I used as a template Nick Cox's FAQ at

HTML Code:

http://www.stata.com/support/faqs/data-management/try-all-values-with-foreach/

. I

My questions are:
1. My observation is that in my choice of search criteria, there's no advantage over a manual search. Am I not using -foreach- correctly?
2. If I want to specify a range of values for variable plicd (e.g. I21.0 to I21.3), how would I specify that in each of my two methods?
3. Note that the value of "I25.3" is captured, even though in my -index- function I specify only "I25.3" for the "I25.x" range.

Code:

clear set obs 10 input mrn tagfor str8 plicd 1 . "I21.0" 2 . "I21.1" 3 . "I22.3" 4 . "I25.2" 5 . "I25.3" 6 . "I26.4" end l, noo egen group = group(plicd) su group, meanonly summ group, detail foreach i of num 1/`r(max)' { replace tagfor=1 if index(plicd, "I21.*") | /// index(plicd, "I22.*") | /// index(plicd, "I25.2") } gen tagman = 1 if index(plicd, "I21.*") | /// index(plicd, "I22.*") | /// index(plicd, "I25.2") replace tagman=0 if tagman!=1 & tagman!=. l mrn plicd group tagfor tagman, noo
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30090
#2

23 Aug 2016, 21:25

index(plicd, "I21.*"), and the other terms in your -if- condition is not what you think it is. The -index()- function does not accept wildcards in its arguments. So what you have written is a literal search for the string I21.*, or I22.* or I25.2 in plicd. Of course, only I25.2 will actually score a "hit" for you. None of the other values of plicd contains an asterisk character. To use wildcards, you need the -strmatch()- function instead. See -help strmatch()- At a higher level, there is no need for the variable group. Moreover, your loop does nothing except repeat the exact same calculation `r(max)' times. Notice that `i' never appears inside that loop, so it is just repeating the same thing each time through. In fact, there is no need for any loop to accomplish this task:

Code:

gen tag = strmatch(plicd, "I21.*") | strmatch(plicd, "I22.*") | plicd == "I25.2"

is all you need.
Comment
Reza Hasib

Join Date: Aug 2016

Posts: 2
#3

23 Aug 2016, 22:50

I've two sets of variables- independent and dependent. I want to run regressions for each of the dependent with each of the independent variables using foreach (or any other suitable command). Could anyone please help me in this regard?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30090
#4

23 Aug 2016, 23:08

Reza,

Although the title of this thread suggests it is about -foreach-, as best I can tell, that is based on the original poster's misunderstanding of the nature of his problem. Your question is unrelated to the actual content of this thread. I suggest, in the future, you start a separate thread for new topics.

So let's suppose your independent variables are IVA IVB IVC iVD, and your dependent variables are DV1 DV2 DV3. Then this would do it:

Code:

foreach v of varlist IVA IVB IVC IVD { foreach w of varlist DV1 DV2 DV3 { regress `v' `w' } }

This is a very basic application of -foreach-. Do read the manual section on -foreach- for a fuller understanding. It's a basic tool in Stata and the time spent learning it will be well repaid.
Comment
Reza Hasib

Join Date: Aug 2016

Posts: 2
#5

24 Aug 2016, 00:15

Sorry for any inconvenience. I'll try to follow the instruction in future. However, thanks for your kind support regarding my query.
Comment
Michael McCulloch

Join Date: Jun 2025

Posts: 24
#6

24 Aug 2016, 18:13

Dear Clyde, many thanks for the elegant simplification, which works beautifully. Thank you for introducing me to -strmatch()-.
Comment

Michael McCulloch

Join Date: Jun 2025
Posts: 24

30 Aug 2016, 17:43

Hello Clyde, your suggested simplified code identifies observations of interest:

Code:

gen tag = strmatch(plicd, "I21.*") | strmatch(plicd, "I22.*") | plicd == "I25.2"

To validate my command, is there another way to specify the substring? I have tried unsuccessfully:

Code:

    gen ccMIval =1 if (    (substr(PLicd10,1,3) == "I21")    | ///
                                (substr(PLicd10,1,3) == "I22") | ///
                                (substr(PLicd10,1,3) == "I25"))
    assert ccMI == ccMIval  // assertion is false

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30090
#8

30 Aug 2016, 18:03

There are several possible reasons that tag (I assume your ccMI is a rename of my tag; if not then I can't comment on this) and ccMIval differ.

The first, and nearly certain, is that my code generates a 0/1 variable, whereas your code for ccMIval creates a missing/1 variable. When you write -gen ccMIval = 1 if whatever-, ccMIval is set to missing value in all observations where whatever is false. My code, by contrast, sets tag = 0. Try taking out the word "if" from your command: that will make ccMIval a 0/1 variable and that may resolve the issue.

If the variables still differ, consider that there may be values of PLicd10 that begin with "I21" but not "I21." or vice versa. Ditto for "I22" and I22.". (For example, there might be a vale of PLicd10 like "I211".) And there could be values of PLicd10 that begin with "I25" but have some value other than "I25.2." You'll have to check those out yourself. To troubleshoot all of these conditions (if removing "if" doesn't resolve the problem) try:

Code:

browse PLicd10 ccMI ccMIval if ccMI != ccMIval

You will be able to see the relevant variables for all offending observations and you should be able to perceive what's going wrong.

Last edited by Clyde Schechter; 30 Aug 2016, 18:05.
Comment

Michael McCulloch

Join Date: Jun 2025
Posts: 24

30 Aug 2016, 18:27

Progress:
1. Apologies for rename of your tag. You are correct, ccMI is my new name for observations identified by your originally suggested code, and ccMIval is the results of my validation with -gen-.
2. After my -gen- command, I replaced all missing =0. Thus, all the desired observations are identified, and they agree by a -tab- command:

HTML Code:

.        tab ccMI ccMIval

        ccMIval
    ccMI    0          1      Total
            
    0    21,934          0     21,934 
    1    0         33         33 
            
    Total    21,934         33     21,967

3. However, they do not agree by the assert command. Both are numeric float and codebook says they're the same variable type. Fascinating! Which is "more correct"?

HTML Code:

------------------------------------------------------------------------------------------------------------------------
ccMI                                                                                                         (unlabeled)
------------------------------------------------------------------------------------------------------------------------

                  type:  numeric (float)

                 range:  [0,1]                        units:  1
         unique values:  2                        missing .:  0/21,967

            tabulation:  Freq.  Value
                        21,934  0
                            33  1

------------------------------------------------------------------------------------------------------------------------
ccMIval                                                                                                      (unlabeled)
------------------------------------------------------------------------------------------------------------------------

                  type:  numeric (float)

                 range:  [0,1]                        units:  1
         unique values:  2                        missing .:  0/21,967

            tabulation:  Freq.  Value
                        21,934  0
                            33  1

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30090
#10

30 Aug 2016, 19:11

Well, from your cross-tab it appears that they are always both 0 or both 1. That wouldn't cover the possibility of disagreement on missing values (which do not appear in the output of -tab-), but according to codebook there are no missing values for either variable (as I would expect given how they were calculated.) I have to say that I'm stumped. Are you sure you typed the -assert- command correctly? When -assert- finds exceptions, it gives an error message telling you how many there were. How many were there?

But again, the next step in troubleshooting is:

Code:

browse PLicd10 ccMI ccMIval if ccMI != ccMIval

That will show you all the problem cases.
Comment
Michael McCulloch

Join Date: Jun 2025

Posts: 24
#11

30 Aug 2016, 19:23

When I reran the code, assert is true. I must have corrected a typo without noting I'd done that.

And... thank you so very much for introducing me to the -browse- command!
Comment

Announcement