Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Loop command

    Hi everyone,

    I have a question regarding expanding the usage of loop command (if possible). Let's say I'm using a database that has 21 variables for each observations (please find the sample below created by dataex):

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double key float heart str5(DX1 DX2 DX3 DX4 DX5 DX6 DX7 DX8 DX9 DX10) int(CHRON1 CHRON2 CHRON3 CHRON4 CHRON5 CHRON6 CHRON7 CHRON8 CHRON9 CHRON10)
    10146237 1 "41401" "41001" "4019"  "4240"  "2724"  "V1582" ""      ""      ""      ""      1 1 1 1 1 0 . . . .
    10001801 1 "99672" "41001" "78551" "496"   "4271"  "41072" "25000" "9971"  "2720"  "2724"  0 1 0 1 1 1 1 0 1 1
    10033631 1 "97081" "41001" "25000" "41401" "49320" "32723" "7904"  "2724"  "2449"  "4019"  0 1 1 1 1 1 0 1 1 1
    10071307 1 "99672" "41001" "51881" "5750"  "4142"  "41401" "4019"  "2724"  "78659" "3051"  0 1 0 0 1 1 1 1 0 1
    10276825 1 "0389"  "51881" "41001" "78552" "4271"  "5119"  "486"   "2761"  "5849"  "99592" 0 0 1 0 1 0 0 0 0 0
    10287383 1 "0389"  "51881" "56983" "41001" "5070"  "486"   "34830" "78552" "78559" "56721" 0 0 0 1 0 0 1 0 0 0
    10310377 1 "0389"  "41001" "51881" "78552" "486"   "34831" "4271"  "2639"  "5849"  "2762"  0 1 0 0 0 1 1 1 0 0
    10245547 1 "51881" "41519" "V667"  "41001" "486"   "7994"  "73313" "49322" "4280"  "42731" 0 0 0 1 0 0 0 1 1 1
    10304219 1 "51881" "99592" "41001" "78552" "4822"  "41189" "5990"  "4280"  "4168"  "1119"  0 0 1 0 0 1 0 1 1 0
    10309903 1 "41041" "42823" "486"   "41001" "5601"  "496"   "4280"  "5949"  "2689"  "72400" 1 1 0 1 0 1 1 0 1 0
    10033084 1 "1970"  "41001" "4271"  "V1052" "V4573" "9971"  "2724"  "V4364" "V153"  "40390" 1 1 1 0 0 0 1 1 0 1
    10285418 1 "51881" "41001" "486"   "49121" "1629"  "1983"  "5849"  "9961"  "4239"  "2761"  0 1 0 1 1 1 0 0 0 0
    10121271 1 "42821" "78551" "41001" "99681" "5849"  "00845" "78959" "V4283" "2930"  "29630" 1 0 1 1 0 0 0 1 0 1
    10177348 1 "03849" "41001" "5849"  "9331"  "5300"  "3320"  "7837"  "27651" "5990"  "2859"  0 1 0 0 0 1 0 0 0 0
    10071817 1 "5579"  "99592" "0380"  "41001" "40391" "5856"  "78552" "78551" "3441"  "5990"  1 0 0 1 1 1 0 0 1 0
    10015097 1 "4111"  "41001" "42833" "5849"  "3360"  "2724"  "30000" "27800" "71690" "V1582" 1 1 1 0 1 1 1 1 1 0
    20551022 1 "0389"  "5845"  "41001" "34831" "486"   "4271"  "78552" "78551" "2875"  "51881" 0 0 1 1 0 1 0 0 1 0
    20411101 1 "0380"  "48232" "51881" "78552" "42841" "41001" "4162"  "6826"  "2762"  "70713" 0 0 0 0 1 1 1 0 0 1
    20296199 1 "99666" "41001" "42741" "9971"  "V667"  "V8542" "78551" "4019"  "496"   "25000" 0 1 1 0 0 1 0 1 1 1
    20009235 1 "97081" "41001" "42741" "51881" "5845"  "3481"  "72888" "2760"  "96500" "30560" 0 1 1 0 0 1 0 0 0 1
    end

    - first variable (named key in the database) is the de-identified code for the patient that is hospitalized
    - 10 variables (named DX1-DX10 in the database) for the diagnoses (which is based on the ICD-9-CM codes). Some are acute conditions that happened during that hospitalization and some are chronic conditions like diabetes that the patient already had before being hospitalized.
    - 10 variables (named CHRON1-10 in the database) indicating whether that diagnosis was a chronic condition (1) or happened during the same hospitalization as an acute condition (0). each CHRON(n) variable corresponds to the same number DX(n) variable. So to see if DX3 was acute or chronic, we look at CHRON3 to find out.

    Previously, I wanted to know that, for example, how many of the observations had a heart condition (lets say ICD code for heart condition is 41001), regardless of whether it happened as an acute or chronic condition. So I used a loop command like below:

    Code:
    gen heart=0
    
    quietly forval j=1/10 {
    replace heart=1 if DX`j'=="41001"
    }
    and that worked perfectly fine. Now, I'd like to select only those observations which had an ACUTE heart attack. So, for example, if a heart code of 41001 is present in DX4, I only want to replace heart=1 if the corresponding CHRON4 is also zero (meaning that it was an acute condition, not chronic). Unfortunately, I don't know how to include that condition into my loop command, if possible at all.

    I greatly appreciate it if anyone can help me in this regard, either utilizing the loop command or suggesting another approach. Thanks!
    Last edited by Reza Hosseini; 22 Jul 2016, 16:47.

  • #2
    You're having this problem because you have your data in wide layout. Like almost all things in Stata data management and analysis, this is very challenging in wide layout and very easy in long layout.

    Code:
    reshape long DX CHRON, i(key) j(dx_num)
    by key, sort: egen heart = max(DX == "41001")
    by key, sort: egen acute_heart = max(DX == "41001" & CHRON == 0)
    Since almost anything else you do going forward is likely to be similarly difficult in wide layout and simple in long layout, I recommend that you leave the data in long layout to move forward. If, however, you have some compelling reason to go back to wide layout, you can just add -reshape wide- to the end of the above code.

    Comment


    • #3
      I often have data that look like this and I consider this to already be in long layout form; however, for certain data mgt purposes, this may be one, it is easier to do as Clyde suggests and then reshape back to "wide" (I note that there is only one observation, in your example, for "key" and for many analysis purposes that will be what you want)

      Comment


      • #4
        Originally posted by Clyde Schechter View Post
        You're having this problem because you have your data in wide layout. Like almost all things in Stata data management and analysis, this is very challenging in wide layout and very easy in long layout.

        Code:
        reshape long DX CHRON, i(key) j(dx_num)
        by key, sort: egen heart = max(DX == "41001")
        by key, sort: egen acute_heart = max(DX == "41001" & CHRON == 0)
        Since almost anything else you do going forward is likely to be similarly difficult in wide layout and simple in long layout, I recommend that you leave the data in long layout to move forward. If, however, you have some compelling reason to go back to wide layout, you can just add -reshape wide- to the end of the above code.
        Thank you so much for your helpful reply, Clyde!

        Comment


        • #5
          Originally posted by Rich Goldstein View Post
          I often have data that look like this and I consider this to already be in long layout form; however, for certain data mgt purposes, this may be one, it is easier to do as Clyde suggests and then reshape back to "wide" (I note that there is only one observation, in your example, for "key" and for many analysis purposes that will be what you want)
          Thank you, Rich! Yes, the database is already "long" without actually being in long format! I hope the reshape works for now otherwise I have to split the database by year.

          Comment

          Working...
          X