Nested Loop and forvalues: when j is different across i

Hanzhang Xu

Join Date: Jul 2018

Posts: 3
#1

Nested Loop and forvalues: when j is different across i

10 Jul 2018, 11:51

Hi there,

I am running a nested loop using forvalues. the variables in the dataset are like this: admit_`i'_`j'

admit_1991_1 admit_1991_2 admit_1991_3
admit_1992_1 admit_1992_2 admit_1992_3 admit_1992_4 admit_1992_5
admit_1993_1 admit_1993_2 admit_1993_3 admit_1993_4

So bascially the j within each i is different. I have another set variables max_`i' (e.g. max_1991, max_1992 etc.) that indicate the maximum of j within each i.

Here is what I have and I got an invalid syntax message:

forvalues i=1991/1995{
local m=max_`i'
forvalues j=1/`m' {
replace flag=1 if admit_`i'_`j"==index_admit
}
}

Can anyone please help me out?

Thank you in advance.

Regards,
Hanzhang
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35659
#2

10 Jul 2018, 12:08

It's hard for me to help you out because it's not clear what you want to do and there is no data example. On the face of it, your data layout is not fit for purpose and you should probably be thinking of reshape long.

Otherwise just about every calculation for this dataset will need awkward programming.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30076
#3

10 Jul 2018, 13:42

Nick gives excellent advice, as always. And I endorse everything he says.

But I will address the syntax error message. You have a double quote character where you should have a single quote character:

Code:

replace flag=1 if admit_`i'_`j"==index_admit *should be replace flag=1 if admit_`i'_`j'==index_admit

That said, if you follow Nick's advice and reshape this data to long layout, you will find that you no longer will need the _`j' part in any case.

Also, at least in the code excerpt you show, the variable flag has not been -generate-d previously, so it is also a syntax error to -replace- it. But the error message for that is more specific than "invalid syntax." So either you actually have -generate-d flag elsewhere in the code, or perhaps Stata has not gotten around to noting this problem yet because it stopped upon finding the invalid " character.
Comment

Hanzhang Xu

Join Date: Jul 2018
Posts: 3

10 Jul 2018, 13:54

Hi Nick and Clyde, thanks for your response and I am sorry that I didn't spell my issue clearly. Here is a bit more description of the data that I am working on. Any feedbacks are greatly appreciated.

I am working on a hospital administrative dataset and I am trying to calculate patients' total number of hospital visits since the diagnosis of diabetes. The dataset is in wide format. Here is a snapshot of the dataset that might give you some ideas about how the data are laid out in the dataset. The variables admit_i_j indicate the date of each hospital visit. For example, admit_1991_1 is the variable that contains the date of the first hospital visit in the year 1991.

Patient ID

Diabetes Diagnosis date

admit_1991_1

admit_1991_2

admit_1991_3

admit_1992_1

admit_1992_2

admit_1993_1

admit_1993_2

admit_1993_3

maxnum_1991

maxnum_1992

maxnum_1993

01/01/1991

03/04/1991

05/05/1991

06/09/1992

02/14/1993

03/16/1993

06/08/1991

02/04/1991

06/08/1991

09/30/1992

11/11/1992

02/20/1993

10/01/1993

11/25/1993

I have the date when each patient was diagnosed with diabetes (variable name:index_admit, variable label: Diabetes Diagnosis date). So the first thing I want to do is to use a nested loop as I posted above to figure out during which hospital visit the patient received the diagnosis. I have another set of variables called maxnum_i (e.g. maxnum_1991, maxnum_1992) that indicate among all the patients in the dataset, what is the highest total number of hospital visits in a given year. Here is what I have so far but got an invalid syntax message:

gen flag_year=.
gen flag_num=.

forvalues i=1991/1995{
local m=maxnum_`i'
forvalues j=1/`m' {
replace flag_year=i if admit_`i'_`j'==index_admit
replace flag_num=j if admit_`i'_`j'==index_admit
}
}

Could you please help me out?

Thank you!
Hanzhang

Last edited by Hanzhang Xu; 10 Jul 2018, 13:59.

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30076
#5

10 Jul 2018, 14:28

Another classic example of something that is very easy to do with long data layout and very confusing/difficult in wide. Note that once you get the right layout, you don't need your maxnum variables for this: Stata won't care how many admissions there were for each patient.

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input byte patientid float(diabetesdiagnosisdate admit_1991_1 admit_1991_2 admit_1991_3 admit_1992_1 admit_1992_2 admit_1993_1 admit_1993_2 admit_1993_3) 1 11323 11323 11385 11447 11848 . 12098 12128 . 2 11481 11357 11481 . 11961 12003 12104 12327 12382 end format %td diabetesdiagnosisdate format %td admit_1991_1 format %td admit_1991_2 format %td admit_1991_3 format %td admit_1992_1 format %td admit_1992_2 format %td admit_1993_1 format %td admit_1993_2 format %td admit_1993_3 reshape long admit_, i(patientid) j(_j) string drop if missing(admit_) split _j, parse("_") gen(xxx) destring by patientid: egen flag_year = max(cond(diabetesdiagnosisdate == admit_), xxx1, .) by patientid: egen flag_num = max(cond(diabetesdiagnosisdate == admit_), xxx2, .) egen flag_year_num = concat(flag_year flag_num), punct("_")

Notes:

1. This code assumes your date variables are all true Stata internal format date variables. If they are not, you must convert them. There is no reasonable way to do this calculation working with strings that humans read as dates.

2. I have left the data in long layout here because almost anything you do next will also require, or at least be easier to do in, long layout. On the off chance that your next move is something that is easiest in wide layout (there are a handful of such things in Stata), then you can go back to wide layout with just

Code:

drop xxx* reshape wide

In the future, when showing data examples, please use the -dataex- command to do so, as I have done in this response. If you are running version 15.1 or a fully updated version 14.2, it is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

When asking for help with code, always show example data. When showing example data, always use -dataex-.
Comment
Hanzhang Xu

Join Date: Jul 2018

Posts: 3
#6

10 Jul 2018, 14:38

Thank you Clyde! Much appreciated.
Comment

Announcement