Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Nested Loop and forvalues: when j is different across i

    Hi there,

    I am running a nested loop using forvalues. the variables in the dataset are like this: admit_`i'_`j'

    admit_1991_1 admit_1991_2 admit_1991_3
    admit_1992_1 admit_1992_2 admit_1992_3 admit_1992_4 admit_1992_5
    admit_1993_1 admit_1993_2 admit_1993_3 admit_1993_4

    So bascially the j within each i is different. I have another set variables max_`i' (e.g. max_1991, max_1992 etc.) that indicate the maximum of j within each i.

    Here is what I have and I got an invalid syntax message:

    forvalues i=1991/1995{
    local m=max_`i'
    forvalues j=1/`m' {
    replace flag=1 if admit_`i'_`j"==index_admit
    }
    }


    Can anyone please help me out?

    Thank you in advance.

    Regards,
    Hanzhang

  • #2
    It's hard for me to help you out because it's not clear what you want to do and there is no data example. On the face of it, your data layout is not fit for purpose and you should probably be thinking of reshape long.

    Otherwise just about every calculation for this dataset will need awkward programming.

    Comment


    • #3
      Nick gives excellent advice, as always. And I endorse everything he says.

      But I will address the syntax error message. You have a double quote character where you should have a single quote character:
      Code:
      replace flag=1 if admit_`i'_`j"==index_admit
      
      *should be
      
      replace flag=1 if admit_`i'_`j'==index_admit
      That said, if you follow Nick's advice and reshape this data to long layout, you will find that you no longer will need the _`j' part in any case.

      Also, at least in the code excerpt you show, the variable flag has not been -generate-d previously, so it is also a syntax error to -replace- it. But the error message for that is more specific than "invalid syntax." So either you actually have -generate-d flag elsewhere in the code, or perhaps Stata has not gotten around to noting this problem yet because it stopped upon finding the invalid " character.

      Comment


      • #4
        Hi Nick and Clyde, thanks for your response and I am sorry that I didn't spell my issue clearly. Here is a bit more description of the data that I am working on. Any feedbacks are greatly appreciated.

        I am working on a hospital administrative dataset and I am trying to calculate patients' total number of hospital visits since the diagnosis of diabetes. The dataset is in wide format. Here is a snapshot of the dataset that might give you some ideas about how the data are laid out in the dataset. The variables admit_i_j indicate the date of each hospital visit. For example, admit_1991_1 is the variable that contains the date of the first hospital visit in the year 1991.
        Patient ID Diabetes Diagnosis date admit_1991_1 admit_1991_2 admit_1991_3 admit_1992_1 admit_1992_2 admit_1993_1 admit_1993_2 admit_1993_3 maxnum_1991 maxnum_1992 maxnum_1993
        1 01/01/1991 01/01/1991 03/04/1991 05/05/1991 06/09/1992 02/14/1993 03/16/1993 3 2 3
        2 06/08/1991 02/04/1991 06/08/1991 09/30/1992 11/11/1992 02/20/1993 10/01/1993 11/25/1993 3 2 3
        I have the date when each patient was diagnosed with diabetes (variable name:index_admit, variable label: Diabetes Diagnosis date). So the first thing I want to do is to use a nested loop as I posted above to figure out during which hospital visit the patient received the diagnosis. I have another set of variables called maxnum_i (e.g. maxnum_1991, maxnum_1992) that indicate among all the patients in the dataset, what is the highest total number of hospital visits in a given year. Here is what I have so far but got an invalid syntax message:

        gen flag_year=.
        gen flag_num=.

        forvalues i=1991/1995{
        local m=maxnum_`i'
        forvalues j=1/`m' {
        replace flag_year=i if admit_`i'_`j'==index_admit
        replace flag_num=j if admit_`i'_`j'==index_admit
        }
        }

        Could you please help me out?

        Thank you!
        Hanzhang
        Last edited by Hanzhang Xu; 10 Jul 2018, 13:59.

        Comment


        • #5
          Another classic example of something that is very easy to do with long data layout and very confusing/difficult in wide. Note that once you get the right layout, you don't need your maxnum variables for this: Stata won't care how many admissions there were for each patient.

          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input byte patientid float(diabetesdiagnosisdate admit_1991_1 admit_1991_2 admit_1991_3 admit_1992_1 admit_1992_2 admit_1993_1 admit_1993_2 admit_1993_3)
          1 11323 11323 11385 11447 11848     . 12098 12128     .
          2 11481 11357 11481     . 11961 12003 12104 12327 12382
          end
          format %td diabetesdiagnosisdate
          format %td admit_1991_1
          format %td admit_1991_2
          format %td admit_1991_3
          format %td admit_1992_1
          format %td admit_1992_2
          format %td admit_1993_1
          format %td admit_1993_2
          format %td admit_1993_3
          
          reshape long admit_, i(patientid) j(_j) string
          drop if missing(admit_)
          split _j, parse("_") gen(xxx) destring
          
          by patientid: egen flag_year = max(cond(diabetesdiagnosisdate == admit_), xxx1, .)
          by patientid: egen flag_num = max(cond(diabetesdiagnosisdate == admit_), xxx2, .)
          egen flag_year_num = concat(flag_year flag_num), punct("_")
          Notes:

          1. This code assumes your date variables are all true Stata internal format date variables. If they are not, you must convert them. There is no reasonable way to do this calculation working with strings that humans read as dates.

          2. I have left the data in long layout here because almost anything you do next will also require, or at least be easier to do in, long layout. On the off chance that your next move is something that is easiest in wide layout (there are a handful of such things in Stata), then you can go back to wide layout with just
          Code:
          drop xxx*
          reshape wide
          In the future, when showing data examples, please use the -dataex- command to do so, as I have done in this response. If you are running version 15.1 or a fully updated version 14.2, it is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

          When asking for help with code, always show example data. When showing example data, always use -dataex-.


          Comment


          • #6
            Thank you Clyde! Much appreciated.

            Comment

            Working...
            X