Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating variable based on values from a list of other variables

    Hello,

    I was hoping someone could provide me with some code to create a variable based on values of a list of other variables. I have 25 variables (labeled dx_icd10_*) and want to create a binary variable that equals 1 when ANY of these variables equals ANY of the following values: E10.*, E11.*, E08.*, E09*, E13.*

    I know that I could do this by creating a very long string of "if" statements but am hoping for something more elegant than this. I have included a dataex example below.

    Thank you!

    Sarah



    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str35 dx_icd10_1 str28 dx_icd10_2 str21 dx_icd10_3 str28(dx_icd10_4 dx_icd10_5 dx_icd10_6) str23 dx_icd10_7 str20 dx_icd10_8 str14 dx_icd10_9
    "E29.1"         "R73.03"  ""              ""       ""       ""      "" "" ""
    "E23.0"         "R73.03"  "N52.1"         ""       ""       ""      "" "" ""
    "E89.0"         "E05.00"  "R73.03"        "E03.8"  ""       ""      "" "" ""
    "R79.89"        ""        ""              ""       ""       ""      "" "" ""
    "R62.52"        "Q23.4"   "Z68.51"        ""       ""       ""      "" "" ""
    "R62.52"        "Z68.51"  ""              ""       ""       ""      "" "" ""
    "R62.52"        "Z68.51"  ""              ""       ""       ""      "" "" ""
    "R73.03"        "E16.1"   "E34.9"         ""       ""       ""      "" "" ""
    "E03.8"         ""        ""              ""       ""       ""      "" "" ""
    "E10.65"        "E27.1"   "E06.3"         "K31.84" "E78.00" ""      "" "" ""
    "E10.65"        "E27.1"   "E06.3"         "K31.84" "E78.00" ""      "" "" ""
    "E11.9"         "E78.5"   ""              ""       ""       ""      "" "" ""
    "E11.9"         "E78.5"   ""              ""       ""       ""      "" "" ""
    "C73"           "E89.0"   "Z79.899"       ""       ""       ""      "" "" ""
    "E66.9"         "E55.9"   "E83.51"        "E21.3"  "Z98.84" "R20.2" "" "" ""
    "E03.1"         ""        ""              ""       ""       ""      "" "" ""
    "E03.1"         ""        ""              ""       ""       ""      "" "" ""
    "E03.1"         ""        ""              ""       ""       ""      "" "" ""
    "E03.1"         ""        ""              ""       ""       ""      "" "" ""
    "E03.1"         ""        ""              ""       ""       ""      "" "" ""
    "E03.1"         ""        ""              ""       ""       ""      "" "" ""
    "E03.1"         ""        ""              ""       ""       ""      "" "" ""
    "E03.1"         ""        ""              ""       ""       ""      "" "" ""
    "E03.1"         ""        ""              ""       ""       ""      "" "" ""
    "E03.1"         ""        ""              ""       ""       ""      "" "" ""
    "E03.1"         ""        ""              ""       ""       ""      "" "" ""
    "E06.3"         ""        ""              ""       ""       ""      "" "" ""
    "C73"           ""        ""              ""       ""       ""      "" "" ""
    "E04.2"         ""        ""              ""       ""       ""      "" "" ""
    "E11.42"        ""        ""              ""       ""       ""      "" "" ""
    "E11.22, N18.3" "E78.5"   "E78.1"         "E11.42" "I10"    "R21"   "" "" ""
    "E11.22, N18.3" "E78.5"   ""              ""       ""       ""      "" "" ""
    "E21.0"         "M85.852" "E03.9"         ""       ""       ""      "" "" ""
    "C73"           ""        ""              ""       ""       ""      "" "" ""
    "E28.2"         ""        ""              ""       ""       ""      "" "" ""
    "E11.65"        "E03.9"   ""              ""       ""       ""      "" "" ""
    "E06.3"         ""        ""              ""       ""       ""      "" "" ""
    "E10.65"        ""        ""              ""       ""       ""      "" "" ""
    "E03.9"         ""        ""              ""       ""       ""      "" "" ""
    "M81.8"         ""        ""              ""       ""       ""      "" "" ""
    "M81.8"         ""        ""              ""       ""       ""      "" "" ""
    "E05.90"        "E05.00"  "E04.2"         ""       ""       ""      "" "" ""
    "E05.00"        ""        ""              ""       ""       ""      "" "" ""
    "E05.00"        "E05.90"  ""              ""       ""       ""      "" "" ""
    "E25.9"         ""        ""              ""       ""       ""      "" "" ""
    "E25.0"         ""        ""              ""       ""       ""      "" "" ""
    "E05.20"        "E05.90"  ""              ""       ""       ""      "" "" ""
    "E21.5"         "E55.9"   "E11.29, R80.9" "I10"    "E21.1"  ""      "" "" ""
    "E55.9"         "E05.20"  "E05.90"        "M81.0"  ""       ""      "" "" ""
    "E25.0"         ""        ""              ""       ""       ""      "" "" ""
    "E10.65"        ""        ""              ""       ""       ""      "" "" ""
    "Q89.2"         "J39.8"   "Z86.39"        "O09.93" ""       ""      "" "" ""
    "E10.65"        ""        ""              ""       ""       ""      "" "" ""
    "E10.65"        ""        ""              ""       ""       ""      "" "" ""
    "Q89.2"         "Z3A.23"  ""              ""       ""       ""      "" "" ""
    "E89.0"         ""        ""              ""       ""       ""      "" "" ""
    "E89.0"         "E05.00"  ""              ""       ""       ""      "" "" ""
    "E87.6"         "D35.01"  ""              ""       ""       ""      "" "" ""
    "E23.0"         ""        ""              ""       ""       ""      "" "" ""
    "E23.0"         ""        ""              ""       ""       ""      "" "" ""
    "E03.9"         "N64.3"   "R35.0"         ""       ""       ""      "" "" ""
    "E03.9"         ""        ""              ""       ""       ""      "" "" ""
    "E23.0"         ""        ""              ""       ""       ""      "" "" ""
    "E10.65"        ""        ""              ""       ""       ""      "" "" ""
    "E10.65"        "R76.8"   ""              ""       ""       ""      "" "" ""
    "E10.65"        ""        ""              ""       ""       ""      "" "" ""
    "E05.90"        "E05.00"  ""              ""       ""       ""      "" "" ""
    "E05.00"        "R03.0"   ""              ""       ""       ""      "" "" ""
    "E05.00"        ""        ""              ""       ""       ""      "" "" ""
    "E83.52"        "E21.0"   "E55.9"         "M85.80" ""       ""      "" "" ""
    "E21.0"         "E83.52"  "M85.852"       "E67.3"  ""       ""      "" "" ""
    "E10.65"        ""        ""              ""       ""       ""      "" "" ""
    "E10.65"        ""        ""              ""       ""       ""      "" "" ""
    "E10.65"        ""        ""              ""       ""       ""      "" "" ""
    "E66.9, Z68.31" "E88.81"  "I10"           "R53.83" ""       ""      "" "" ""
    "E10.9"         "E06.3"   "Z96.41"        ""       ""       ""      "" "" ""
    "E10.65"        ""        ""              ""       ""       ""      "" "" ""
    "E10.65"        ""        ""              ""       ""       ""      "" "" ""
    "E10.65"        "Z96.41"  "E03.8, E06.3"  ""       ""       ""      "" "" ""
    "E10.9"         "Z96.41"  "E03.8, E06.3"  ""       ""       ""      "" "" ""
    "P70.2"         "R73.9"   ""              ""       ""       ""      "" "" ""
    "E16.1"         ""        ""              ""       ""       ""      "" "" ""
    "P70.2"         "R73.9"   ""              ""       ""       ""      "" "" ""
    "P70.2"         "R73.9"   ""              ""       ""       ""      "" "" ""
    "Q98.4"         "E78.00"  ""              ""       ""       ""      "" "" ""
    "E10.9"         "E03.9"   ""              ""       ""       ""      "" "" ""
    "Q98.4"         "E78.00"  ""              ""       ""       ""      "" "" ""
    "E10.9"         "E03.9"   ""              ""       ""       ""      "" "" ""
    "E10.9"         ""        ""              ""       ""       ""      "" "" ""
    "R53.83"        ""        ""              ""       ""       ""      "" "" ""
    "R53.83"        "R63.1"   ""              ""       ""       ""      "" "" ""
    "C73"           ""        ""              ""       ""       ""      "" "" ""
    "M81.8"         "Q87.89"  "Z98.1"         ""       ""       ""      "" "" ""
    "D35.2"         "G43.819" ""              ""       ""       ""      "" "" ""
    "R63.5"         "R79.89"  "N92.6"         "D35.2"  ""       ""      "" "" ""
    "E10.649"       "Z96.41"  "K90.0"         "E10.65" "E16.2"  ""      "" "" ""
    "E10.9"         ""        ""              ""       ""       ""      "" "" ""
    "E10.9"         ""        ""              ""       ""       ""      "" "" ""
    "E10.9"         ""        ""              ""       ""       ""      "" "" ""
    "E10.9"         ""        ""              ""       ""       ""      "" "" ""
    end

  • #2
    Install the user written -egenmore-
    Code:
    findit egenmore
    and follow the instructions to install it, and then

    try this:

    Code:
    . egen dummy = rany(dx_icd10_*), cond(substr(@,1,4)=="E10." | substr(@,1,4)=="E11." | substr(@,1,4)=="E08." | substr(@,1,3)=="E09" | substr(@,1,4)=="E13.")

    Comment


    • #3
      Hi Joro,

      When I try your code I am getting the error "type mismatch." I am not sure how to correct this. Can you advise?

      Thank you very much!

      Sarah

      Comment


      • #4
        This code works for your data example:

        Code:
        gen wanted = 0 
        
        foreach v of var dx* { 
            foreach s in E10. E11.  E08.  E09 { 
                 replace wanted = 1 if strpos(`v', "`s'") 
            } 
        }
        
        list dx_icd10_1-dx_icd10_5 if wanted
        The occurrence of values like
        "E11.29, R80.9" suggests to me that looking at the first few characters alone may be optimistic.
        It's been a while since I used rany() in egenmore even though I wrote it. In the help I recommend using loops instead. But it should work. I can't see a problem with @Joro Kolev's code and I verified that rany() works with strings:


        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input str5 foo str6 bar
        "frog" "dragon"
        "toad" "frog"  
        "toad" "toad"  
        end.
        
        egen wanted = rany(???), cond(strpos(@, "frog"))
        
        list 
        
             +------------------------+
             |  foo      bar   wanted |
             |------------------------|
          1. | frog   dragon        1 |
          2. | toad     frog        1 |
          3. | toad     toad        0 |
             +------------------------+
        So i would check whether somehow one of your input variables is numeric.

        Code:
          
         ds dx_icd10_*, numeric

        Comment


        • #5
          Going to the troubleshooting.

          The error message suggests that one of your variables is numeric. What Nick is saying is that you should examine the list of variables and see whether any of them is not numeric.

          Nick's code of doing this does not work on my machine:
          Code:
          . ds dx_icd10_*, numeric
          option numeric not allowed
          r(198);
          Probably Nick means:

          Code:
          . ds dx_icd10_*, has(type numeric)
          Or less advanced way just type
          Code:
          . describe dx_icd10_*
          
                        storage   display    value
          variable name   type    format     label      variable label
          ----------------------------------------------------------------------------------------------------
          dx_icd10_1      str35   %35s                  
          dx_icd10_2      str28   %28s                  
          dx_icd10_3      str21   %21s                  
          dx_icd10_4      str28   %28s                  
          dx_icd10_5      str28   %28s                  
          dx_icd10_6      str28   %28s                  
          dx_icd10_7      str23   %23s                  
          dx_icd10_8      str20   %20s                  
          dx_icd10_9      str14   %14s
          and verify whether any of the variables in the list is not numeric.

          Comment


          • #6
            Hello Joro and Nick,

            Thank you for your very thoughtful suggestions- I really appreciate it. I am still getting the error message "type mismatch" even when I drop the variables dx_icd10_15-25 which have a "byte" format (see below). This is very frustrating. Any further advice you could give would be much appreciated.

            Sarah



            . describe dx_icd10_*

            storage display value
            variable name type format label variable label
            -------------------------------------------------------------------------------------------------------------------
            dx_icd10_1 str35 %35s DX_ICD10_1
            dx_icd10_2 str28 %28s DX_ICD10_2
            dx_icd10_3 str21 %21s DX_ICD10_3
            dx_icd10_4 str28 %28s DX_ICD10_4
            dx_icd10_5 str28 %28s DX_ICD10_5
            dx_icd10_6 str28 %28s DX_ICD10_6
            dx_icd10_7 str23 %23s DX_ICD10_7
            dx_icd10_8 str20 %20s DX_ICD10_8
            dx_icd10_9 str14 %14s DX_ICD10_9
            dx_icd10_10 str20 %20s DX_ICD10_10
            dx_icd10_11 str20 %20s DX_ICD10_11
            dx_icd10_12 str5 %9s DX_ICD10_12
            dx_icd10_13 str6 %9s DX_ICD10_13
            dx_icd10_14 str5 %9s DX_ICD10_14
            dx_icd10_15 byte %8.0g DX_ICD10_15
            dx_icd10_16 byte %8.0g DX_ICD10_16
            dx_icd10_17 byte %8.0g DX_ICD10_17
            dx_icd10_18 byte %8.0g DX_ICD10_18
            dx_icd10_19 byte %8.0g DX_ICD10_19
            dx_icd10_20 byte %8.0g DX_ICD10_20
            dx_icd10_21 byte %8.0g DX_ICD10_21
            dx_icd10_22 byte %8.0g DX_ICD10_22
            dx_icd10_23 byte %8.0g DX_ICD10_23
            dx_icd10_24 byte %8.0g DX_ICD10_24
            dx_icd10_25 byte %8.0g DX_ICD10_25

            Comment


            • #7
              Everyone's right here on something. Joro Kolev has the right syntax for ds. My memory of the syntax was closer to that for findname (Stata Journal) but even there the syntax should be

              Code:
                
               findname dx_icd10_*, type(numeric)


              Sorry about that. But the main guess was right. Your variables
              dx_icd10_15 on are byte, which is a numeric variable or storage type (not a format), and that explains the type mismatch with any syntax yet proposed.

              There are further possibilities. One is that those extra variables
              dx_icd10_15 on are missing in all observations, which could be why they are numeric in the first case. You can check that with

              Code:
              su dx_icd10_15-dx_icd10_25 


              and if you get a column of 0s then everything is missing and you can ignore those variables. If you get means shown, we need to know more. The describe output doesn't show a value label. My bet would be on everything being missing. Sick people can be sick for all sorts of reasons at all sorts of time but 15 variables populated with codes does sound a lot.

              But, but, but: I have no idea why type mismatch is reported if you are applying either syntax to the first 14 variables alone.

              As usual, we need to see the exact code you tried and example data for which it didn't work.


              Comment


              • #8
                Try making everything a string variable, :

                Code:
                 tostring dx_icd10_15-dx_icd10_25, replace
                and then what we spoke above.

                Comment


                • #9
                  Sarah is telling us that the problem arises even if she drops those variables.

                  Comment


                  • #10
                    Hello! I am still having trouble. I used the code below. The loop is ok for a while and then stops, saying "type mismatch" (see below). It seems to occur with observation 12990 but I can't figure out what I'm missing.

                    Thank you both again.

                    Sarah



                    tostring dx_icd10_15-dx_icd10_25, replace

                    gen diabetes = 0
                    foreach v of var dx* {
                    foreach s in E10. E11. E08. E09 {
                    replace diabetes = 1 if strpos(`v', "`s'")
                    }
                    list dx_icd10_1-dx_icd10_5 if diabetes

                    }






                    12990. | dx_icd10_1 |
                    | E10.9 |
                    |------------------------------------------------------------------------------------------------------------------------------|
                    | dx_name_2 |
                    | |
                    |------------------------------------------------------------------------------------------------------------------------------|
                    | dx_icd10_2 |
                    | |
                    |------------------------------------------------------------------------------------------------------------------------------|
                    | dx_name_3 |
                    | |
                    |------------------------------------------------------------------------------------------------------------------------------|
                    | dx_icd10_3 |
                    | |
                    |------------------------------------------------------------------------------------------------------------------------------|
                    | dx_name_4 |
                    | |
                    |------------------------------------------------------------------------------------------------------------------------------|
                    | dx_icd10_4 |
                    | |
                    |------------------------------------------------------------------------------------------------------------------------------|
                    | dx_name_5 |
                    | |
                    |------------------------------------------------------------------------------------------------------------------------------|
                    | dx_icd10_5 |
                    | |
                    +------------------------------------------------------------------------------------------------------------------------------+
                    type mismatch


                    If I run the code without the "list" line, I get this:

                    gen diabetes = 0

                    . foreach v of var dx* {
                    2. foreach s in E10. E11. E08. E09 {
                    3. replace diabetes = 1 if strpos(`v', "`s'")
                    4. }
                    5.
                    . }
                    (0 real changes made)
                    (0 real changes made)
                    (0 real changes made)
                    (0 real changes made)
                    (2,333 real changes made)
                    (3,111 real changes made)
                    (39 real changes made)
                    (17 real changes made)
                    (0 real changes made)
                    (0 real changes made)
                    (0 real changes made)
                    (0 real changes made)
                    (90 real changes made)
                    (354 real changes made)
                    (4 real changes made)
                    (2 real changes made)
                    (0 real changes made)
                    (0 real changes made)
                    (0 real changes made)
                    (0 real changes made)
                    (18 real changes made)
                    (123 real changes made)
                    (2 real changes made)
                    (3 real changes made)
                    (0 real changes made)
                    (0 real changes made)
                    (0 real changes made)
                    (0 real changes made)
                    (4 real changes made)
                    (67 real changes made)
                    (1 real change made)
                    (0 real changes made)
                    (0 real changes made)
                    (0 real changes made)
                    (0 real changes made)
                    (0 real changes made)
                    (0 real changes made)
                    (22 real changes made)
                    (0 real changes made)
                    (0 real changes made)
                    (0 real changes made)
                    (0 real changes made)
                    (0 real changes made)
                    (0 real changes made)
                    (0 real changes made)
                    (2 real changes made)
                    (0 real changes made)
                    (0 real changes made)
                    (0 real changes made)
                    (0 real changes made)
                    (0 real changes made)
                    (0 real changes made)
                    (0 real changes made)
                    (1 real change made)
                    (0 real changes made)
                    (0 real changes made)
                    (0 real changes made)
                    (0 real changes made)
                    (0 real changes made)
                    (0 real changes made)
                    (0 real changes made)
                    (0 real changes made)
                    (0 real changes made)
                    (0 real changes made)
                    (0 real changes made)
                    (0 real changes made)
                    (0 real changes made)
                    (0 real changes made)
                    (0 real changes made)
                    (0 real changes made)
                    (0 real changes made)
                    (0 real changes made)
                    (0 real changes made)
                    (0 real changes made)
                    (0 real changes made)
                    (0 real changes made)
                    (0 real changes made)
                    (0 real changes made)
                    (0 real changes made)
                    (0 real changes made)
                    (0 real changes made)
                    (0 real changes made)
                    (0 real changes made)
                    (0 real changes made)
                    (0 real changes made)
                    (0 real changes made)
                    (0 real changes made)
                    (0 real changes made)
                    (0 real changes made)
                    (0 real changes made)
                    (0 real changes made)
                    (0 real changes made)
                    (0 real changes made)
                    (0 real changes made)
                    (0 real changes made)
                    (0 real changes made)
                    (0 real changes made)
                    (0 real changes made)
                    (0 real changes made)
                    (0 real changes made)
                    (0 real changes made)
                    (0 real changes made)
                    (0 real changes made)
                    (0 real changes made)
                    (0 real changes made)
                    (0 real changes made)
                    (0 real changes made)
                    (0 real changes made)
                    (0 real changes made)
                    (0 real changes made)
                    (0 real changes made)
                    (0 real changes made)
                    type mismatch
                    r(109);

                    Comment


                    • #11
                      Your wildcard dx* is catching dx_name* too.

                      In #4 I used dx* because it was short enough to match the variables needed in the data example; But with your full dataset it includes more than you want.

                      I think you need
                      Code:
                       
                       dx_icd10_*
                      as a wildcard.

                      Comment


                      • #12
                        Thank you so much! That worked. I feel silly that I didn't catch that myself. At least I will know to look for that next time. I really appreciate the help!

                        Comment


                        • #13
                          For educational purposes, check whether the -egen- solution that I proposed (or rather, Nick's egen function) also works, that is type:

                          Code:
                           egen dummy = rany(dx_icd10_*), cond(strpos(@,"E10.") | strpos(@,"E11.") | strpos(@,"E08.") | strpos(@,"E09") | strpos(@,"E13."))
                          
                          compare dummy diabetes
                          and report back whether the two variables are the same.

                          I am usually not very appreciative of user contributed solutions, but I think this -egen, rany- function which Nick has written is pretty handy and important.

                          By the way this -egen, rany- is the string correspondent to the -egen, anymatch- , and I use the latter a lot.

                          Comment

                          Working...
                          X