Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Shorter code for looping through consecutive variables.

    Hi,

    My dataset has 25 variables for diagnoses (dx1 to dx25). I use the following code to identify patients with particular conditions. Is there are way for me to shorten this code so that I do not have to specify a line for each diagnosis variable.

    Code:
    forvalues i=2009/2012 {
        use "/scratch/abc/XYZ/xyz_`i'_core.dta", clear
        keep if inlist(dx1, "75611", "75612") | ///
            inlist(dx2, "75611", "75612") | ///
            inlist(dx3, "75611", "75612") | ///
            inlist(dx4, "75611", "75612") | ///
            inlist(dx5, "75611", "75612") | ///
            inlist(dx6, "75611", "75612") | ///
            inlist(dx7, "75611", "75612") | ///
            inlist(dx8, "75611", "75612") | ///
            inlist(dx9, "75611", "75612") | ///
            inlist(dx10, "75611", "75612") | ///
            inlist(dx11, "75611", "75612") | ///
            inlist(dx12, "75611", "75612") | ///
            inlist(dx13, "75611", "75612") | ///
            inlist(dx14, "75611", "75612") | ///
            inlist(dx15, "75611", "75612") | ///
            inlist(dx16, "75611", "75612") | ///
            inlist(dx17, "75611", "75612") | ///
            inlist(dx18, "75611", "75612") | ///
            inlist(dx19, "75611", "75612") | ///
            inlist(dx20, "75611", "75612") | ///
            inlist(dx21, "75611", "75612") | ///
            inlist(dx22, "75611", "75612") | ///
            inlist(dx23, "75611", "75612") | ///
            inlist(dx24, "75611", "75612") | ///
            inlist(dx25, "75611", "75612")    
        save "/scratch/abc/XYZ_`i'_pqr.dta", replace
    }
    I am using Stata 12 for Windows.

    Thank you,
    Caroline

  • #2
    Code:
    local myif 
    
    forval j = 1/24 { 
        local myif `myif' inlist(dx`j', "75611", "75612") | 
    }
    
    local myif `myif' inlist(dx25, "75611", 75612") 
    
    forvalues i=2009/2012 {
         use "/scratch/abc/XYZ/xyz_`i'_core.dta", clear
         keep if `myif' 
         save "/scratch/abc/XYZ_`i'_pqr.dta", replace
    }

    Comment


    • #3
      Try this:

      Code:
      forvalues i=2009/2012 {     
             use "/scratch/abc/XYZ/xyz_`i'_core.dta", clear           
             gen keeper=0           
             forvalues j=1/25 {             
                 replace keeper=1 if inlist(dx`j', "75611", "75612")           
             }          
             keep if keeper==1          
             save "/scratch/abc/XYZ_`i'_pqr.dta", replace 
      }
      Last edited by ben earnhart; 22 Dec 2014, 10:04.

      Comment


      • #4
        Thank you so much, Nick and Ben!

        Comment


        • #5
          Caroline,

          Since I happen to know the data set you are using, I will offer my two cents worth. My usual technique is the one that Ben suggests, but Nick's is intriguing because with a few tweaks it might allow you to save some time. Since the NIS is very large, it tends to take a long time to load into memory. Accordingly, if you can use the "use if <condition> using <file>" notation, you can speed the process up considerably (depending on how big of a subset you are extracting. So, you can modify Nick's code as follows:

          Code:
           
          local myif   
          forval j = 1/24 {
                local myif `myif' inlist(dx`j', "75611", "75612") |  
          }  
          
          local myif `myif' inlist(dx25, "75611", 75612")
          
          forvalues i=2009/2012 {
                use if `myif' using "/scratch/abc/XYZ/xyz_`i'_core.dta", clear
                save "/scratch/abc/XYZ_`i'_pqr.dta", replace 
          }
          Alternatively, if you find Nick's code to be too mysterious you can do the same thing with your code:

          Code:
           
          forvalues i=2009/2012 {
            use if inlist(dx1, "75611", "75612") | ///
                       inlist(dx2, "75611", "75612") | /// 
                       ...
                       inlist(dx25, "75611", "75612")    "/scratch/abc/XYZ/xyz_`i'_core.dta", clear  
             save "/scratch/abc/XYZ_`i'_pqr.dta", replace 
          }
          Regards,
          Joe

          Comment


          • #6
            Joe's tweak looks good. I Googled 75611 to decode the magic number, but did not hit gold.

            Readers living in or visiting London might like to know that

            http://countdown.tfl.gov.uk/#%7CstopCode=75611

            gives information on buses arriving at Edmonton Police Station.

            Comment


            • #7
              Caroline is using ICD-9 diagnosis codes. 75611 and 75612 (actually, 756.11 and 756.12) are "Spondylolysis" and "Spondylolisthesis", respectively (spine anomalies).

              Comment


              • #8
                Thank you, Joe. This is fantastic. Yes, it took me a while to figure out methods to extract my cohorts. - use if - was a great resource i found on the stata website. And then I had access to some super-memory computers - so that bailed me out as well. Please feel free to share your experience / ideas / resources in managing NIS.

                Nick: Yes, these are some of the ICD9 codes for some causes of lower back pain. They may be found as 756.11 and 756.12 on the web. The dataset eliminates the periods and codes them as strings.

                Comment

                Working...
                X