Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • foreach looping to identify adjacent cases

    Hello Everyone,
    My first post

    Looking for help in getting this code to run. error is invalid numlist

    I'm very new to STATA and this is the task I'm faced with:
    I'm creating a dummy variable that indicates a certain type of case,
    then I need foreach to loop through checking for adjacent cases.
    I would like to do this for multiple ranges 1,2,3,4,5,6,7,8,9,10 cases away creating a new dummy variable,
    then tab with Fisher's exact Phi.
    Later I will need to add breaks for another area group variable value so that the loop stops after finishing an area and then begins again in a new area value.

    Here is what I have

    sort seqid

    gen cd1 = 0
    replace cd1 = 1 if (icd1==1)|(icd1==5)|(icd1==9)|(icd1==28)|(icd1==64 )|(icd1==79)|(icd1==92)|(icd1==104)|(icd1==154)|(i cd1==189)


    gen cda = 0

    foreach j of numlist = (1/10){
    foreach i of numlist = (1/`j')
    {
    if (cd1[_n-`i'] | cd1[_n+1]) replace cda = 1
    }
    tab cd1 cda exact
    }

    I really appreciate any help!

  • #2
    The error message you are getting arises because the notation =1/10 is not used with -numlist-. That notation is used with -forvalues-. So you can change it in one of two ways:

    Code:
    foreach j of numlist 1(1) 10 { //...etc.
    
    OR
    
    forvalues j = 1/10 { // etc.
    Evidently, similar considerations apply to your -foreach i of numlist = ...- command. That one actually has a second error: the opening curly brace ("{") must be on the same line as the -foreacah- statement in Stata.

    A few other pointers.

    1. Your -replace cd1 = 1 if...- command is really long and difficult to read. You can do exactly the same with the much more comprehensible
    Code:
    replace cd1 = 1 if inlist(icd1, 1, 5, 9, 28, 64, 79, 92, 104, 154, 189)
    In fact, both the -gen cd1 = 0- and -replace cd1 = ...- commands can be replaced with the single command
    Code:
    gen cd1 = inlist(icd1, 1, 5, 9, 28, 64, 79, 92, 104, 154, 189)
    2. Although -if (cd1[_n-`i'] | cd1[_n+1]) replace cda = 1- is legal syntax, a safer style is
    Code:
    if (cd1[_n-`i'] | cd1[_n+1])  {
         replace cda = 1
    }
    The reason is that at some point you might want to add something besides just the -replace cda = 1- here, and if you forget to put everything inside { } braces, only the first of those commands will actually be subject to the -if-. Debugging that can prove difficult and frustrating because our eyes tend not to recognize the problem. So it's safer practice to always enclose everything guarded by an -if- command in { } braces, even when it's only one command.

    With all of that said, I have not looked at the logic of your code to see whether it will do what you intend. Once you get it to run without syntax errors, if it's not doing what you want, re-post showing some sample starting data (use -dataex-, please), the exact code you used and exactly what Stata responded (copied directly from the Results window or your log file into a code block), and, a sample of the data as it looks after your code runs (also created with -datatex-). If you don't have -dataex- installed, just run -ssc install dataex-, and then read -help dataex- to learn how it's used.

    Comment


    • #3
      THANKS SO MUCH!

      I'll update after the corrections.

      Comment


      • #4
        I made those changes, but now I'm getting an -invalid syntax- error.

        I think i have an "{" or "}" out of place. Here is my do file code.

        sort seqid


        gen cd1 = inlist(icd1, 1, 5, 9, 28, 64, 79, 92, 104, 154, 189)


        gen cda = 0


        forvalues j = 1/10 {

        forvalues i = (1/`j'){

        if (cd1[_n-`i'] | cd1[_n+1])
        replace cda = 1
        }
        }
        tab cd1 cda exact

        Here is a data sample broken into a few sections. The first portion shows that for much of the data set variable "ed" is "0", those cases I do not want to compare, because that indicates no geo location was available . In the second (ed=694)and third sections (ed=707) I captured examples of the variable "icd1", this is what I am trying to identify adjacency patterns in. So after debugging the above code I need to create breaks between these "ed" groups. Is there a recommended solution to this? I was thinking of writing -if ed = 694- ,-if ed = 707- to run the loop in each group, but if there's a better way, please advise. Thanks!


        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input float seqid int(ed icd1)
            1   0   .
            2   0   .
            3   0   .
            4   0   .
            5   0   .
            6   0  
        .
        47337 694   .
        47338 694 120
        47339 694   .
        47340 694   .
        47341 694   .
        
        
          49909 707   .
        49910 707   .
        49911 707  70
        49912 707   .
        49913 707   .
        49914 707   .
        49915 707   .
        49916 707   .
        49917 707   .
        49918 707 154
        49919 707 185
        49920 707   .
        49921 707   .
        
        
           .   .   .
        end

        Comment


        • #5
          You've picked up on CODE formatting for your example data, which is great, but we need it for the code also. If you have two lines


          Code:
          if (cd1[_n-`i'] | cd1[_n+1])
          replace cda = 1
          then that is not going to work, but it's almost certainly not what you want even when combined as one line, as the replace changes all values of 1 if the condition tested as true, not just the values you are looking at.

          I don't understand what you are trying to do, so I can't suggest better code. The goal of identifying "adjacency patterns" is not specific enough for me to understand.
          Last edited by Nick Cox; 29 Jan 2016, 07:25.

          Comment


          • #6
            By adjacency patterns, I mean two or more cases in which "cd1" occurs within a range of the cases in sequence. I want to check, say house "1" for specific "icd1" values, if it has one of the target values then check house "2", "3", "4" "5"... and generate a new variable that indicates how close the same "icd1" value occurred in the sequence. This needs to be done in each neighborhood separately, which creates the need to provide breaks in between different "ed" values. Does that make more sense?

            Comment


            • #7
              Let's try working from the other end. If you have houses in a sequence, and some houses have some condition, then you can look backwards and forwards in the sequence and that gives you a separation from the nearer occurrence of that condition. Here's an example. Two neighbourhoods, ten houses in each and some have cats owning the house.

              Code:
              clear
              set seed 2803
              set obs 20
              egen nhood = seq(), block(10)
              egen houseno = seq(), to(10)
              gen has_cat = runiform() < 0.2
              
              bysort nhood (houseno) : gen where_cat = houseno if has_cat
              gen where_1 = where_cat
              gen where_2 = where_cat
              bysort nhood : replace where_1 = where_1[_n-1] if missing(where_1)
              gsort nhood -houseno
              by nhood : replace where_2 = where_2[_n-1] if missing(where_2)
              
              gen dist = min(houseno - where_1, where_2 - houseno)
              
              l, sepby(nhood)
              
              
                   +-----------------------------------------------------------------+
                   | nhood   houseno   has_cat   where_~t   where_1   where_2   dist |
                   |-----------------------------------------------------------------|
                1. |     1        10         0          .         7         .      3 |
                2. |     1         9         0          .         7         .      2 |
                3. |     1         8         0          .         7         .      1 |
                4. |     1         7         1          7         7         7      0 |
                5. |     1         6         1          6         6         6      0 |
                6. |     1         5         0          .         4         6      1 |
                7. |     1         4         1          4         4         4      0 |
                8. |     1         3         0          .         .         4      1 |
                9. |     1         2         0          .         .         4      2 |
               10. |     1         1         0          .         .         4      3 |
                   |-----------------------------------------------------------------|
               11. |     2        10         0          .         8         .      2 |
               12. |     2         9         0          .         8         .      1 |
               13. |     2         8         1          8         8         8      0 |
               14. |     2         7         1          7         7         7      0 |
               15. |     2         6         0          .         5         7      1 |
               16. |     2         5         1          5         5         5      0 |
               17. |     2         4         1          4         4         4      0 |
               18. |     2         3         1          3         3         3      0 |
               19. |     2         2         0          .         .         3      1 |
               20. |     2         1         0          .         .         3      2 |
                   +-----------------------------------------------------------------+
              The important positives are:

              1. This kind of thing is only rarely a loop. Deep down, it is a loop, but you use by: and subscripting and the right sort order.

              2. Doing this separately in neighbourhoods is easy to arrange. Again you use by:

              Comment


              • #8
                I appreciate the code design you suggested, but I've been tasked with debugging the previous -foreach- code. It doesn't seem negotiable to change course for some meta theoretical reason. If there are any mistakes in this code, please advise.

                foreach j of numlist = 1(1) 10 {

                foreach i of numlist = (1(1)`j')

                if (cd1[_n-`i'] | cd1[_n+1]) {
                replace cda = 1
                }
                tab cd1 by cda exact
                }

                Comment


                • #9
                  I've already pointed out a likely error in #5.

                  The tabulate statement just looks like a guess. The help is the place to fix that.

                  What in your code confines operations to the same area?

                  I don't recognise a "meta theoretical reason" behind my code, just some experience in writing Stata programs. Clearly you don't have to take my advice.
                  Last edited by Nick Cox; 29 Jan 2016, 15:29.

                  Comment


                  • #10
                    I was not talking about your reasons as being meta theoretical. What I really meant was that I don't get to decide to change the approach, and the reason is something outside of this section of code, later in the analysis, that I'm not yet aware of, etc. I like your version. I'm going to learn to use it. Thanks for the help. Apologies if my earlier post read in a way that seemed unappreciative or derogatory in anyway. It was definitely not my intent. Again, many thanks for helping!
                    Last edited by Zack Butler; 29 Jan 2016, 16:28.

                    Comment


                    • #11
                      Nick the code you suggested works very well!!!

                      I want to -tab houseno dist, e- for population level association correlation.

                      Does the "0" distort this correlation?

                      If so, don't I need it to tell me the distance between two houses with cats?

                      Instead of reporting "0" for the house with cat, would I need it to report the distance to the next "0"(house with cat)?

                      Comment


                      • #12
                        I got the code to work!
                        Here it is...

                        sort seqid

                        gen SelectedCauses = inlist(icd1,1,5,6,7,8,9,10,13,14,61,28,29,92,104,1 05)
                        label var SelectedCauses "SelectedCauses"

                        tab SelectedCauses

                        gen cda=0

                        foreach j of numlist 1/10 {
                        foreach i of numlist 1/`j' {
                        replace cda=1 if (SelectedCauses[_n-`i'] | SelectedCauses[_n+`i'])
                        }
                        display "Household distance" = "`i'"
                        tab SelectedCauses cda, exact
                        }

                        Comment

                        Working...
                        X