Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Question about generate new variable.

    Hi all,

    How can I efficiently generate a new indicator variable wellchild = 1 if in any variable dx1 - dx29 contains Z0001, Z020, Z00129, Z23. Please check the example below. (Is there an efficiency loop or function to do this, since I have more id and variables than this exmaple)

    Best,

    Jack


    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str10 beneid str7(dx1 dx2 dx3 dx4 dx5 dx6 dx7 dx8 dx9 dx10 dx11 dx12 dx13 dx14 dx15 dx16 dx17 dx18 dx19 dx20 dx21 dx22 dx23 dx24 dx25 dx26 dx27 dx28 dx29)
    "1" ""        ""       ""    "" "" "" "" "" ""      "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""     "" ""     
    "2" "H6692"   ""       ""    "" "" "" "" "" ""      "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""     "" ""     
    "3" "Q539"    "Z00129" "Z23" "" "" "" "" "" "Z0001" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""     "" ""     
    "4" "E611"    "Z00129" "Z23" "" "" "" "" "" ""      "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""     "" ""     
    "5" "Q539"    ""       ""    "" "" "" "" "" ""      "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""     "" ""     
    "6" "S0500XA" ""       ""    "" "" "" "" "" ""      "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""     "" "Z0001"
    "7" "R05"     ""       ""    "" "" "" "" "" ""      "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "Z020" "" ""     
    "8" "J069"    "R05"    ""    "" "" "" "" "" ""      "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""     "" ""     
    "9" "J069"    "R509"   ""    "" "" "" "" "" ""      "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""     "" ""     
    "10" "J069"    "R05"    ""    "" "" "" "" "" ""      "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""     "" ""     
    end

  • #2
    Code:
    reshape long dx, i(beneid) j(dx_num)
    by beneid, sort: egen byte well_child = max(inlist(dx, "Z0001", "Z020", "Z00129", "Z23"))
    Note: As with nearly all data management and analyses in Stata, this is much easier in the long layout. If you have a compelling reason to return your data to wide layout after calculating this well child variable, you can do so with just -reshape wide-. But if you plan additional analysis from that point, I would advise against doing that. Whatever you want to do next will probably be easier in long layout as well (and perhaps impossible in wide layout).

    Comment


    • #3
      Originally posted by Clyde Schechter View Post
      Code:
      reshape long dx, i(beneid) j(dx_num)
      by beneid, sort: egen byte well_child = max(inlist(dx, "Z0001", "Z020", "Z00129", "Z23"))
      Note: As with nearly all data management and analyses in Stata, this is much easier in the long layout. If you have a compelling reason to return your data to wide layout after calculating this well child variable, you can do so with just -reshape wide-. But if you plan additional analysis from that point, I would advise against doing that. Whatever you want to do next will probably be easier in long layout as well (and perhaps impossible in wide layout).
      Hi Clyder, thank you very much!

      Additional questions, what does the max() mean? And actually when I put more code in the inlist function like this long ("Z0001", "Z020", "Z00129", "Z23", "Z02.6", "Z02.71", "Z02.79", "Z02.81", "Z02.82") that I need to do, it told me "expression too long". How can I solve this issue?

      Thanks,

      Jack Liang

      Comment


      • #4
        First, the purpose of the max() function. The expression -inlist(whatever)- is a Boolean expression that evaluates to either 0 or 1, depending on whether the first argument is found among the others (1) or not (0). Each beneid has multiple observations, and the dx code might or might not be present among those Z variables in each one of them. So for each group of observations defined by a single value of beneid, Stata evaluates that -inlist()- expression and determines whether or not it is true in that observation, returning a 0 or a 1 for each observation accordingly. The maximum of those 0's and 1's will be 1 if there is any observation where dx is found among those Z's, whereas if that never happens, the maximum value will be 0. So, putting it in more general terms, -by(grouping_variable): egen result = max(logical_expression)- calculates result to indicate whether or not logical_expression is true for any of the observations belonging to the current value of the grouping variable.

        As for the second problem, I don't understand why that happened. -inlist()- will allow a total of 10 string arguments, including the first. With dx, "Z0001", "Z020", "Z00129", "Z23", "Z02.6", "Z02.71", "Z02.79", "Z02.81", "Z02.82", you are within that limit, so I don't understand why you got that error. Did you actually have a longer list of options that you are not showing here? If so, you need to break the list up listing a maximum of 9 comparisons in each. Thus -inlist(dx, first 9 options) | inlist(dx, next 9 options) - (and possibly more of these depending on how many options there are.)

        Comment

        Working...
        X