Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • array search loop using substr & r(109) mismatch error

    Hi Statalist

    I am new to STATA, but fortunately come with some syntax experience from previous programs and work.

    My situation:
    I have a dataset which contains diagnostic codes. Each record (6000 records) can have up to 100 diagnosis codes. Each diagnosis code is set up as an individual variable (not by me, it is how the dataset came), ie diag01 diag 02…. I am setting up a do file where I want a loop to check through all of the diagnosis variable (diag01 – diag100) and check to see if certain diagnosis codes have been recorded.

    Code:
    gen aAB100=0
    quietly forval i=1(1)100 {
    replace aAB100=1 if substr(Diag`i',1,5)=="AB100"
    }
    label variable aAB100 "Diagnosis"
    label define Diagnosis 1 "Yes" 0 "No"
    label values aAB100 Diagnosis

    Output:
    I am receiving a r(109) mismatch error. I assume the error has to do with the string containing both letters and numbers, however can’t seem to get a straight answer by looking over manuals and discussion forums. Any advice on a better method would be appreciated. Interestingly aAB100 output does occur with correct results, so it appears that the error occurs at the end of everything.

    Thank you
    Court

  • #2
    Some of your Diag* variables must be numeric to produce this error. Try this:
    Code:
    ds Diag*, has(type numeric)
    to find out which ones. Then, to make everything consistent, you should apply -tostring- to the numeric ones.

    By the way, this whole thing would be easier if you had the data in long instead of wide layout.

    Code:
    gen long obs_no = _n
    reshape long Diag, i(obs_no) j(_j)
    egen aAB100 = max(substr(Diag, 1, 5) == "AB100")
    Of course, this won't work either until you get all of the Diag variables into string storage type. Also, if there is some kind of id variable already in the data, you don't need to create the obs_no variable: just put the id variable in the -i()- option instead of obs_no.
    Last edited by Clyde Schechter; 22 Feb 2018, 19:44. Reason: Correct code error.

    Comment


    • #3
      Clyde, thank you very much for taking the time to respond to my post.

      Turns out I have a number of Diag in this format, which I will now change.

      Also thank you for the extra code, this is my next task. Procedural codes were provided in long format, so this code will be great for that.

      Comment

      Working...
      X