foreach looping to identify adjacent cases

Zack Butler

Join Date: Jan 2016

Posts: 8
#1

foreach looping to identify adjacent cases

28 Jan 2016, 19:37

Hello Everyone,
My first post

Looking for help in getting this code to run. error is invalid numlist

I'm very new to STATA and this is the task I'm faced with:
I'm creating a dummy variable that indicates a certain type of case,
then I need foreach to loop through checking for adjacent cases.
I would like to do this for multiple ranges 1,2,3,4,5,6,7,8,9,10 cases away creating a new dummy variable,
then tab with Fisher's exact Phi.
Later I will need to add breaks for another area group variable value so that the loop stops after finishing an area and then begins again in a new area value.

Here is what I have

sort seqid

gen cd1 = 0
replace cd1 = 1 if (icd1==1)|(icd1==5)|(icd1==9)|(icd1==28)|(icd1==64 )|(icd1==79)|(icd1==92)|(icd1==104)|(icd1==154)|(i cd1==189)

gen cda = 0

foreach j of numlist = (1/10){
foreach i of numlist = (1/`j')
{
if (cd1[_n-`i'] | cd1[_n+1]) replace cda = 1
}
tab cd1 cda exact
}

I really appreciate any help!
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30118
#2

28 Jan 2016, 20:05

The error message you are getting arises because the notation =1/10 is not used with -numlist-. That notation is used with -forvalues-. So you can change it in one of two ways:

Code:

foreach j of numlist 1(1) 10 { //...etc. OR forvalues j = 1/10 { // etc.

Evidently, similar considerations apply to your -foreach i of numlist = ...- command. That one actually has a second error: the opening curly brace ("{") must be on the same line as the -foreacah- statement in Stata.

A few other pointers.

1. Your -replace cd1 = 1 if...- command is really long and difficult to read. You can do exactly the same with the much more comprehensible

Code:

replace cd1 = 1 if inlist(icd1, 1, 5, 9, 28, 64, 79, 92, 104, 154, 189)

In fact, both the -gen cd1 = 0- and -replace cd1 = ...- commands can be replaced with the single command

Code:

gen cd1 = inlist(icd1, 1, 5, 9, 28, 64, 79, 92, 104, 154, 189)

2. Although -if (cd1[_n-`i'] | cd1[_n+1]) replace cda = 1- is legal syntax, a safer style is

Code:

if (cd1[_n-`i'] | cd1[_n+1]) { replace cda = 1 }

The reason is that at some point you might want to add something besides just the -replace cda = 1- here, and if you forget to put everything inside { } braces, only the first of those commands will actually be subject to the -if-. Debugging that can prove difficult and frustrating because our eyes tend not to recognize the problem. So it's safer practice to always enclose everything guarded by an -if- command in { } braces, even when it's only one command.

With all of that said, I have not looked at the logic of your code to see whether it will do what you intend. Once you get it to run without syntax errors, if it's not doing what you want, re-post showing some sample starting data (use -dataex-, please), the exact code you used and exactly what Stata responded (copied directly from the Results window or your log file into a code block), and, a sample of the data as it looks after your code runs (also created with -datatex-). If you don't have -dataex- installed, just run -ssc install dataex-, and then read -help dataex- to learn how it's used.
Comment
Zack Butler

Join Date: Jan 2016

Posts: 8
#3

28 Jan 2016, 20:11

THANKS SO MUCH!

I'll update after the corrections.
Comment
Zack Butler

Join Date: Jan 2016

Posts: 8
#4

29 Jan 2016, 07:02

I made those changes, but now I'm getting an -invalid syntax- error.

I think i have an "{" or "}" out of place. Here is my do file code.

sort seqid

gen cd1 = inlist(icd1, 1, 5, 9, 28, 64, 79, 92, 104, 154, 189)

gen cda = 0

forvalues j = 1/10 {

forvalues i = (1/`j'){

if (cd1[_n-`i'] | cd1[_n+1])
replace cda = 1
}
}
tab cd1 cda exact

Here is a data sample broken into a few sections. The first portion shows that for much of the data set variable "ed" is "0", those cases I do not want to compare, because that indicates no geo location was available . In the second (ed=694)and third sections (ed=707) I captured examples of the variable "icd1", this is what I am trying to identify adjacency patterns in. So after debugging the above code I need to create breaks between these "ed" groups. Is there a recommended solution to this? I was thinking of writing -if ed = 694- ,-if ed = 707- to run the loop in each group, but if there's a better way, please advise. Thanks!

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input float seqid int(ed icd1) 1 0 . 2 0 . 3 0 . 4 0 . 5 0 . 6 0 . 47337 694 . 47338 694 120 47339 694 . 47340 694 . 47341 694 . 49909 707 . 49910 707 . 49911 707 70 49912 707 . 49913 707 . 49914 707 . 49915 707 . 49916 707 . 49917 707 . 49918 707 154 49919 707 185 49920 707 . 49921 707 . . . . end
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35725
#5

29 Jan 2016, 07:22

You've picked up on CODE formatting for your example data, which is great, but we need it for the code also. If you have two lines

Code:

if (cd1[_n-`i'] | cd1[_n+1]) replace cda = 1

then that is not going to work, but it's almost certainly not what you want even when combined as one line, as the replace changes all values of 1 if the condition tested as true, not just the values you are looking at.

I don't understand what you are trying to do, so I can't suggest better code. The goal of identifying "adjacency patterns" is not specific enough for me to understand.

Last edited by Nick Cox; 29 Jan 2016, 07:25.
Comment
Zack Butler

Join Date: Jan 2016

Posts: 8
#6

29 Jan 2016, 07:38

By adjacency patterns, I mean two or more cases in which "cd1" occurs within a range of the cases in sequence. I want to check, say house "1" for specific "icd1" values, if it has one of the target values then check house "2", "3", "4" "5"... and generate a new variable that indicates how close the same "icd1" value occurred in the sequence. This needs to be done in each neighborhood separately, which creates the need to provide breaks in between different "ed" values. Does that make more sense?
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35725

29 Jan 2016, 07:58

Let's try working from the other end. If you have houses in a sequence, and some houses have some condition, then you can look backwards and forwards in the sequence and that gives you a separation from the nearer occurrence of that condition. Here's an example. Two neighbourhoods, ten houses in each and some have cats owning the house.

Code:

clear
set seed 2803
set obs 20
egen nhood = seq(), block(10)
egen houseno = seq(), to(10)
gen has_cat = runiform() < 0.2

bysort nhood (houseno) : gen where_cat = houseno if has_cat
gen where_1 = where_cat
gen where_2 = where_cat
bysort nhood : replace where_1 = where_1[_n-1] if missing(where_1)
gsort nhood -houseno
by nhood : replace where_2 = where_2[_n-1] if missing(where_2)

gen dist = min(houseno - where_1, where_2 - houseno)

l, sepby(nhood)


     +-----------------------------------------------------------------+
     | nhood   houseno   has_cat   where_~t   where_1   where_2   dist |
     |-----------------------------------------------------------------|
  1. |     1        10         0          .         7         .      3 |
  2. |     1         9         0          .         7         .      2 |
  3. |     1         8         0          .         7         .      1 |
  4. |     1         7         1          7         7         7      0 |
  5. |     1         6         1          6         6         6      0 |
  6. |     1         5         0          .         4         6      1 |
  7. |     1         4         1          4         4         4      0 |
  8. |     1         3         0          .         .         4      1 |
  9. |     1         2         0          .         .         4      2 |
 10. |     1         1         0          .         .         4      3 |
     |-----------------------------------------------------------------|
 11. |     2        10         0          .         8         .      2 |
 12. |     2         9         0          .         8         .      1 |
 13. |     2         8         1          8         8         8      0 |
 14. |     2         7         1          7         7         7      0 |
 15. |     2         6         0          .         5         7      1 |
 16. |     2         5         1          5         5         5      0 |
 17. |     2         4         1          4         4         4      0 |
 18. |     2         3         1          3         3         3      0 |
 19. |     2         2         0          .         .         3      1 |
 20. |     2         1         0          .         .         3      2 |
     +-----------------------------------------------------------------+

The important positives are:

1. This kind of thing is only rarely a loop. Deep down, it is a loop, but you use by: and subscripting and the right sort order.

2. Doing this separately in neighbourhoods is easy to arrange. Again you use by:

Comment

Zack Butler

Join Date: Jan 2016

Posts: 8
#8

29 Jan 2016, 14:52

I appreciate the code design you suggested, but I've been tasked with debugging the previous -foreach- code. It doesn't seem negotiable to change course for some meta theoretical reason. If there are any mistakes in this code, please advise.

foreach j of numlist = 1(1) 10 {

foreach i of numlist = (1(1)`j')

if (cd1[_n-`i'] | cd1[_n+1]) {
replace cda = 1
}
tab cd1 by cda exact
}
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35725
#9

29 Jan 2016, 15:22

I've already pointed out a likely error in #5.

The tabulate statement just looks like a guess. The help is the place to fix that.

What in your code confines operations to the same area?

I don't recognise a "meta theoretical reason" behind my code, just some experience in writing Stata programs. Clearly you don't have to take my advice.

Last edited by Nick Cox; 29 Jan 2016, 15:29.
Comment
Zack Butler

Join Date: Jan 2016

Posts: 8
#10

29 Jan 2016, 15:30

I was not talking about your reasons as being meta theoretical. What I really meant was that I don't get to decide to change the approach, and the reason is something outside of this section of code, later in the analysis, that I'm not yet aware of, etc. I like your version. I'm going to learn to use it. Thanks for the help. Apologies if my earlier post read in a way that seemed unappreciative or derogatory in anyway. It was definitely not my intent. Again, many thanks for helping!

Last edited by Zack Butler; 29 Jan 2016, 16:28.
Comment
Zack Butler

Join Date: Jan 2016

Posts: 8
#11

31 Jan 2016, 19:36

Nick the code you suggested works very well!!!

I want to -tab houseno dist, e- for population level association correlation.

Does the "0" distort this correlation?

If so, don't I need it to tell me the distance between two houses with cats?

Instead of reporting "0" for the house with cat, would I need it to report the distance to the next "0"(house with cat)?
Comment
Zack Butler

Join Date: Jan 2016

Posts: 8
#12

01 Feb 2016, 15:19

I got the code to work!
Here it is...

sort seqid

gen SelectedCauses = inlist(icd1,1,5,6,7,8,9,10,13,14,61,28,29,92,104,1 05)
label var SelectedCauses "SelectedCauses"

tab SelectedCauses

gen cda=0

foreach j of numlist 1/10 {
foreach i of numlist 1/`j' {
replace cda=1 if (SelectedCauses[_n-`i'] | SelectedCauses[_n+`i'])
}
display "Household distance" = "`i'"
tab SelectedCauses cda, exact
}
Comment

Announcement