Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • defining variable with a couple of paramters in stata 13

    Hi everyone!

    I am using stata13

    in my data set I have:

    variable "patientmrn" where each value appears twice

    each value is further defined by variable "ctdivolbodymgy" which has either a higher or lower numerical variable

    like this:
    patientmrn ctdivolbodymgy
    11111 7
    11111 1.6
    2222 8.5
    2222 2.8
    I want to generate a new variable "scantype" where for every value for "patientmrn", the higher value for "ctdivolbodymgy" has value "ccta"

    and the lower value for "ctdivolbodymgy" has value "cacs", to create a data set that will look like this:
    patientmrn ctdivolbodymgy scantype
    11111 7 ccta
    11111 1.6 cacs
    2222 8.5 ccta
    2222 2.8 cacs
    I have tried sorting my data set using:
    sort patientmrn ctdivolbodymgy
    gen scantype=.
    replace scantype = "ccta" if max(ctdivolbodymgy), by(patientmrn)
    I keep getting error "invalid syntax r(198);"

    Any help would be greatly appreciated.

    Thank you, in advance, for your help and expertise.

  • #2
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int patientmrn float ctdivolbodymgy
    11111   7
    11111 1.6
     2222 8.5
     2222 2.8
    end
    
    bys patientmrn (ctdivolbodymgy): assert _N==2 & ctdivolbodymgy[1]!= ctdivolbodymgy[2] & !missing(ctdivolbodymgy)
    by patientmrn: gen wanted= cond(_n==1, "cacs", "ccta")
    Res.:

    Code:
    . l, sepby(patientmrn)
    
         +------------------------------+
         | patien~n   ctdivo~y   wanted |
         |------------------------------|
      1. |     2222        2.8     cacs |
      2. |     2222        8.5     ccta |
         |------------------------------|
      3. |    11111        1.6     cacs |
      4. |    11111          7     ccta |
         +------------------------------+

    Comment


    • #3
      The problems in #1 start with various facts about the max() function in Stata. (Note the max() function of egen and the max() function of Mata behave differently.

      1. max() works row-wise (within observations) in your context, unless otherwise specified

      2. max() requires two or more comma-separated arguments which it compares to yield their maximum. (In my view this could be documented a little better.)

      Thus max() fed with a variable name (and nothing else) would not return the maximum over a variable (column-wise), or a subset thereof, for either reason.

      In context something like
      Code:
      bysort id (x) : gen wanted = max(x[1], x[2]) 
      could help here, although taking the maximum is redundant given the sorting (subject to small print about missing values).


      3, You want your code to mean "use the value of one variable corresponding to the maximum of another variable". But if max() as a qualifier -- when fed with legal arguments within () -- could only mean if the maximum is true, or not zero. That construct is not a kind of look-up operation.

      4. replace doesn't support a by() option. This is the easiest of these problems to avoid, as it's implied by the help.

      Comment

      Working...
      X