Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • loop to count number of observations

    Hi

    I am working on a data set that is count the number of times a a patient has been diagnosed with a particular type of inflammatory bowel disease, coded as type 1, 2 or 3. I have made my data wide so for each row there is the patient ID followed by columns if ibd_type1 ibd_type2 ibd_type3 which represent diagnoses being given on different dates.
    I am trying to write a loop to count the number of times a patient has been coded to have a 1 in a column, how many times a '2' diagnoses has been given etc.

    The loop i have previously used to count operation codes doesnt seem to be working. I get an error message 'type mismatch r(109)' despite all of the variables being float. I have also tried converted them all to string without any luck.

    Here is the data set example and the loop I have been trying to use.


    foreach a in ibd_type1 ibd_type2 ibd_type3 ibd_type4 ibd_type5 ibd_type6 ibd_type7 ibd_type8 ibd_type9 ibd_type10 ibd_type11 ibd_type12 ibd_type13 ibd_type14 ibd_type15 ibd_type16 ibd_type17 ibd_type18 {
    rename `a' xx`a'
    replace xx`a'=subinstr(xx`a',".","", .)
    }

    gen ulcerative_colitis=0
    forvalues x=1/18{
    replace ulcerative_colitis=ulcerative_colitis+1 if xxibd_type`x'== 1

    }

    . . . . . . .
    patid obsdate1 ibd_type1 medcodeid1 term1 obsdate2 ibd_type2 medcodeid2 term2 obsdate3 ibd_type3
    123 05dec2005 1 107644019 Ulcerative colitis . .
    456 14jan2005 1 07644019 Ulcerative colitis 17jan2005 1 2532953017 Exacerbation of ulcerative colitis .
    789 02dec2013 3 41137017 Inflammatory bowel disease . .
    842 15may2017 3 41137017 Inflammatory bowel disease 1dec2018 2 56765016 Crohn's disease .


    any help navigating this problem much appreciated!
    thanks
    Jennifer


  • #2
    This problem seems to require counts over variables (not observations):

    Code:
    forval j = 1/3 {
         egen count`j' = anycount(ibd_type*), values(`j')
    }
    I don't fully understand what you are trying to do otherwise. The first loop seems to be trying to replace values of numeric missing with empty strings, but that doesn't make sense to Stata, first of all because subinstr() is being asked to work on a numeric variable, which is why type mismatch is the error message. subinstr() is for string arguments only. The goal of replacing missing numeric values with empty strings doesn't make sense either, but you need not go there. Fortunately the code above will count values of 1, 2, 3 respectively and ignore missings, which is what I think you want to do.

    The second loop is similar to my loop above in intent. Naturally you may prefer a name like ulcerative_colitis to count1, but after my loop that could be a single rename command.

    Code:
    rename (count*) (ulcerative_colitis something_else another_condition)
    The tricks that are easy to miss include the fact that egen is a kind of ragbag of useful functions and the use of a wildcard such as ibd_type* which makes typing out the full list of variable names unnecessary.

    See also https://journals.sagepub.com/doi/pdf...867X0900900107 for a tutorial review of working rowwise.


    Comment

    Working...
    X