Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Specifying multiple criteria for -foreach-

    Hello, I'm working on building a -foreach- statement in hopes of making more efficient the search criteria for a -replace- command. Below is my code, in which I examine the variable plicd for specific values, comparing two methods:
    1. Using -foreach- to make variable tagfor ==1 whenever the search criteria are satisfied. And,
    2. Using -replace- to make variable tagman==1 whenever the search criteria are satisfied.

    For my -foreach- code, I used as a template Nick Cox's FAQ at
    HTML Code:
    http://www.stata.com/support/faqs/data-management/try-all-values-with-foreach/
    . I

    My questions are:
    1. My observation is that in my choice of search criteria, there's no advantage over a manual search. Am I not using -foreach- correctly?
    2. If I want to specify a range of values for variable plicd (e.g. I21.0 to I21.3), how would I specify that in each of my two methods?
    3. Note that the value of "I25.3" is captured, even though in my -index- function I specify only "I25.3" for the "I25.x" range.

    Code:
    clear
    set obs 10
    input mrn tagfor str8 plicd
    1 . "I21.0"
    2 . "I21.1"
    3 . "I22.3"
    4 . "I25.2"
    5 . "I25.3"
    6 . "I26.4"
    end
    l, noo
    
    egen group = group(plicd)
    su group, meanonly
     summ group, detail
     foreach i of num 1/`r(max)' {
            replace tagfor=1 if         index(plicd, "I21.*") | ///
                                                                            index(plicd, "I22.*") | ///
                                                                            index(plicd, "I25.2")
            }
            gen tagman = 1 if                 index(plicd, "I21.*") | ///
                                                                            index(plicd, "I22.*") | ///
                                                                            index(plicd, "I25.2")
    replace tagman=0 if tagman!=1 & tagman!=.
    l mrn plicd group tagfor tagman, noo

  • #2

    index(plicd, "I21.*"), and the other terms in your -if- condition is not what you think it is. The -index()- function does not accept wildcards in its arguments. So what you have written is a literal search for the string I21.*, or I22.* or I25.2 in plicd. Of course, only I25.2 will actually score a "hit" for you. None of the other values of plicd contains an asterisk character. To use wildcards, you need the -strmatch()- function instead. See -help strmatch()- At a higher level, there is no need for the variable group. Moreover, your loop does nothing except repeat the exact same calculation `r(max)' times. Notice that `i' never appears inside that loop, so it is just repeating the same thing each time through. In fact, there is no need for any loop to accomplish this task:
    Code:
     gen tag = strmatch(plicd, "I21.*") | strmatch(plicd, "I22.*") | plicd == "I25.2"
    is all you need.

    Comment


    • #3
      I've two sets of variables- independent and dependent. I want to run regressions for each of the dependent with each of the independent variables using foreach (or any other suitable command). Could anyone please help me in this regard?

      Comment


      • #4
        Reza,

        Although the title of this thread suggests it is about -foreach-, as best I can tell, that is based on the original poster's misunderstanding of the nature of his problem. Your question is unrelated to the actual content of this thread. I suggest, in the future, you start a separate thread for new topics.

        So let's suppose your independent variables are IVA IVB IVC iVD, and your dependent variables are DV1 DV2 DV3. Then this would do it:

        Code:
        foreach v of varlist IVA IVB IVC IVD {
            foreach w of varlist DV1 DV2 DV3 {
                regress `v' `w'
            }
        }
        This is a very basic application of -foreach-. Do read the manual section on -foreach- for a fuller understanding. It's a basic tool in Stata and the time spent learning it will be well repaid.

        Comment


        • #5
          Sorry for any inconvenience. I'll try to follow the instruction in future. However, thanks for your kind support regarding my query.

          Comment


          • #6
            Dear Clyde, many thanks for the elegant simplification, which works beautifully. Thank you for introducing me to -strmatch()-.

            Comment


            • #7
              Hello Clyde, your suggested simplified code identifies observations of interest:
              Code:
              gen tag = strmatch(plicd, "I21.*") | strmatch(plicd, "I22.*") | plicd == "I25.2"
              To validate my command, is there another way to specify the substring? I have tried unsuccessfully:

              Code:
                  gen ccMIval =1 if (    (substr(PLicd10,1,3) == "I21")    | ///
                                              (substr(PLicd10,1,3) == "I22") | ///
                                              (substr(PLicd10,1,3) == "I25"))
                  assert ccMI == ccMIval  // assertion is false

              Comment


              • #8
                There are several possible reasons that tag (I assume your ccMI is a rename of my tag; if not then I can't comment on this) and ccMIval differ.

                The first, and nearly certain, is that my code generates a 0/1 variable, whereas your code for ccMIval creates a missing/1 variable. When you write -gen ccMIval = 1 if whatever-, ccMIval is set to missing value in all observations where whatever is false. My code, by contrast, sets tag = 0. Try taking out the word "if" from your command: that will make ccMIval a 0/1 variable and that may resolve the issue.

                If the variables still differ, consider that there may be values of PLicd10 that begin with "I21" but not "I21." or vice versa. Ditto for "I22" and I22.". (For example, there might be a vale of PLicd10 like "I211".) And there could be values of PLicd10 that begin with "I25" but have some value other than "I25.2." You'll have to check those out yourself. To troubleshoot all of these conditions (if removing "if" doesn't resolve the problem) try:

                Code:
                browse PLicd10 ccMI ccMIval if ccMI != ccMIval
                You will be able to see the relevant variables for all offending observations and you should be able to perceive what's going wrong.
                Last edited by Clyde Schechter; 30 Aug 2016, 18:05.

                Comment


                • #9
                  Progress:
                  1. Apologies for rename of your tag. You are correct, ccMI is my new name for observations identified by your originally suggested code, and ccMIval is the results of my validation with -gen-.
                  2. After my -gen- command, I replaced all missing =0. Thus, all the desired observations are identified, and they agree by a -tab- command:

                  HTML Code:
                  .        tab ccMI ccMIval
                  
                          ccMIval
                      ccMI    0          1      Total
                              
                      0    21,934          0     21,934 
                      1    0         33         33 
                              
                      Total    21,934         33     21,967 
                  3. However, they do not agree by the assert command. Both are numeric float and codebook says they're the same variable type. Fascinating! Which is "more correct"?

                  HTML Code:
                  ------------------------------------------------------------------------------------------------------------------------
                  ccMI                                                                                                         (unlabeled)
                  ------------------------------------------------------------------------------------------------------------------------
                  
                                    type:  numeric (float)
                  
                                   range:  [0,1]                        units:  1
                           unique values:  2                        missing .:  0/21,967
                  
                              tabulation:  Freq.  Value
                                          21,934  0
                                              33  1
                  
                  ------------------------------------------------------------------------------------------------------------------------
                  ccMIval                                                                                                      (unlabeled)
                  ------------------------------------------------------------------------------------------------------------------------
                  
                                    type:  numeric (float)
                  
                                   range:  [0,1]                        units:  1
                           unique values:  2                        missing .:  0/21,967
                  
                              tabulation:  Freq.  Value
                                          21,934  0
                                              33  1

                  Comment


                  • #10
                    Well, from your cross-tab it appears that they are always both 0 or both 1. That wouldn't cover the possibility of disagreement on missing values (which do not appear in the output of -tab-), but according to codebook there are no missing values for either variable (as I would expect given how they were calculated.) I have to say that I'm stumped. Are you sure you typed the -assert- command correctly? When -assert- finds exceptions, it gives an error message telling you how many there were. How many were there?

                    But again, the next step in troubleshooting is:

                    Code:
                     browse PLicd10 ccMI ccMIval if ccMI != ccMIval
                    That will show you all the problem cases.

                    Comment


                    • #11
                      When I reran the code, assert is true. I must have corrected a typo without noting I'd done that.

                      And... thank you so very much for introducing me to the -browse- command!

                      Comment

                      Working...
                      X