Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Panel Data: How to create a flag indicating if a condition is true at time t and all future times?

    Hello, all,

    I've got some panel data for a randomized trial, and I'm trying to link the data to the diagnoses in Medicare claims files. I have flags for whether a person meets the diagnostic criteria for a number of chronic conditions in the current year. I would like to create a flag indicating that someone has had a disease that turns on when they get the disease in one year, and then stays on, regardless of whether they currently have the disease.

    For example, consider dementia, which is often not coded in medical claims. Say I'm willing to assume that once someone has been flagged as having dementia in 2001, they probably still have it even if the 2003 medical claims don't say they do. However, I don't want their line for the year 2000 to say that they do.

    I currently have the first 3 columns, and I'm trying to create the 4th.
    Name Year Current Dementia Dementia Now or Prior
    Mrs Vang 2000 0 0
    Mrs Vang 2001 1 1
    Mrs Vang 2002 0 1
    Mrs Vang 2003 0 1
    I believe that this will accomplish my goal, and I'd like to ask if people think this is correct:

    tsset ID year
    gen dementiaprior = currentdementia
    replace dementiaprior = L.dementiaprior if dementiaprior == 0 & L.dementiaprior == 1

    This appears to work in the test data I've set up (which is in Stata 11 format), and from the tsset documentation, it appears that the xt suite of commands (e.g. xtgee and xtregress) will work after I've tsset the data. Am I right? If I am, can anyone think of a more parsimonious way to do this?

    Thanks for any assistance you can provide.
    Attached Files
    Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

    When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

  • #2
    Well, if your data have no missing years for any person, and no missing values for the diagnosis, then that code will work. But missing years or missing values will break it. Given that you are working from a large claims data set, it is a fair bet that the data are messy and these problems will arise. This code is a little more robust:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte id int year byte(diabetes dementia)
    1 2000 0 1
    1 2001 1 0
    1 2002 0 0
    1 2003 1 0
    1 2004 1 0
    1 2005 1 0
    2 2000 0 1
    2 2001 0 0
    2 2002 0 1
    2 2003 0 0
    2 2004 0 0
    2 2005 1 0
    3 2000 0 0
    3 2001 0 0
    3 2002 0 0
    3 2003 0 0
    3 2004 0 1
    3 2005 0 0
    4 2000 1 0
    4 2001 0 1
    4 2002 0 0
    4 2003 0 0
    4 2004 0 0
    4 2005 0 0
    end
    
    foreach v of varlist diabetes dementia {
        assert inlist(`v', 0, 1) if !missing(`v')
        by id (year), sort: gen `v'_now_or_prior = sum(`v')
        replace `v'_now_or_prior = 1 if `v'_now_or_prior > 1
    }
    list, noobs sepby(id) abbr(24)
    The idea is to calculate a running total of the number of times the disease is recorded up to and including the present point. Then if that sum is greater than 1, replace it by 1.

    For the future, please post example data using the -dataex- command, as I have done. You can get it installed by running -ssc install dataex-. -help dataex- provides simple instructions for using it. Thank you.

    Comment


    • #3
      Thanks, Craig, that is more robust indeed. I don't know assert yet, but looks like learning it will be beneficial.

      It turns out that in this case, I have neither missing years nor missing Dx values. Someone better than I created the main analytic file, and I'm linking from the Medicare beneficiary summary files, which always have 1s and 0s for the Dx flags (i.e. the people who created those files already treated missing properly).
      Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

      When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

      Comment

      Working...
      X