Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Question about using if flag + tab together

    Hi all,

    I am trying to create a Table 1 using Medicare claims to compare demographics among people who do and do not have an outcome, acute MI. I am using a file in which there are multiple claims per person. Each person is identified by a ilinkid.

    I created a flag using:

    egen byte flag = tag(ïlinkid)

    I also created two categories of people with and without acute MI. A person who has acute MI is in category hasMI=1, and a person who doesn't is in hasMI=0.

    I was wondering if there was any way to use tab along with the demographic category and the flag. For example, I tried: "tab race if hasMI==1, if flag" which does not work. I want my percentages to be based on # people in the cohort, not on the # claims.

    I appreciate any help!

  • #2
    First, -tab race if hasMI == 1, if flag- is not valid Stata syntax. Syntactically it would need to be -tab race if hasMI == 1 & flag-.

    But that syntactically correct command might not get you what you actually want. And, in fact, I urge you to take a different approach towards creating a person-level tabulation in this data.

    The problem is that in data like this, different observations about the same person may contain different values of some variables. Age will definitely change over time. So will things like smoking status, body mass index, lab values of any kind, etc. Diagnoses come and go. Geographic location of residence and of treatment change. Lots of things change. When you use -egen flag = tag(ïlinkid),- Stata assigns flag = 1 to one randomly selected selected observation per person, and 0 to the others. So you are truly generating a table of random numbers in this way. (And, it won't even come out the same each time you do it.) Only if you are going to tabulate only variables that are completely consistent over all observations of a given person can the resulting table be sensible.

    The safer approach to this kind of situation is to first identify rules determining which values of the variables that appear in the many observations of a person are the ones you want to capture in your table. You might want those recorded in the first claim. Or maybe for some variables you would want the average or median. You might want the highest value for some. And so on. Then when you have settled on that, the -collapse- command will reduce your data set to one observation per person with the values you designate.* Then you can do whatever -tab- commands you need, with no need to create a flag variable.

    *In some situation you might want to select a value that is calculated from all the available values and isn't among the statistics built into -collapse-. In that case, using -gen-, -replace- and -egen- commands you can calculate that and make sure you place the result in every observation of the same person. Then just pick -(first)- or -(last)- for that variable in your -collapse- command.

    Comment


    • #3
      I wrote the first version of egen, tag() in 1999 -- although even then I was just coding up a device common in Stataland, which I probably did not invent -- and so can strongly endorse Clyde's advice.

      But you don't need personal testimony. The key to understanding when tag() is a good idea is already in the help.

      When all observations in a group have the same value for a summary variable
      calculated for the group, it will be sufficient to use just one value for many purposes.

      That might well be strengthened to start

      If (and only if) ...

      Comment


      • #4
        Thank you both! This is extremely helpful advice for a novice stata user.

        Comment

        Working...
        X