Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating an indicator variable in Stata long format

    Hi all,

    I'm having difficulty with creating indicator variables in a dataset in long format in stata. The data contains a ID, date of treatment, and treatment code, for example (all made up data):

    ID Date Treatment

    1 1Jan1990 D

    1 1Feb1991 D

    1 1Mar1992 T

    1 1Feb1993 F

    1 1Mar1994 D

    2 1Apr1990 D

    2 1Feb1992 D

    2 1Feb1995 D

    Before I convert to wide format I want to create a new indicator variable that is 1 if the patient has EVER had a treatment "T" and 0 if the patient has never received a treatment "T". In the example above I would want the indicator variable to show the following:


    ID Date Treatment Indicator

    1 1Jan1990 D 1

    1 1Feb1991 D 1

    1 1Mar1992 T 1

    1 1Feb1993 F 1

    1 1Mar1994 D 1

    2 1Apr1990 D 0

    2 1Feb1992 D 0

    2 1Feb1995 D 0

    The difficulty I am having is that treatment "T" isn't necessarily always the last treatment code for a given individual (and that each individual can have a variable number of treatments)

    Similarly, I would also like to create an indicator variable to denote what the last treatment a patient had received at exactly 1 year after their first treatment. For the made up dataset below:

    ID Date Treatment

    1 1Jan1990 D

    1 1Feb1991 P

    1 1Mar1992 T

    1 1Feb1993 F

    1 1Mar1994 H

    2 1Jan1990 H

    2 1Feb1990 P

    2 1Feb1995 H

    3 1Jan1993 H

    3 1Dec1993 T

    I want to create an indicator variable:

    ID Date Treatment Indicator of treatment at 1yr after first treatment

    1 1Jan1990 H H

    1 1Feb1991 P H

    1 1Mar1992 T H

    1 1Feb1993 F H

    1 1Mar1994 H H

    2 1Jan1990 H P

    2 1Feb1990 P P

    2 1Feb1995 H P

    3 1Jan1993 H T

    3 1Dec1993 T T

    Does anyone know if there is a way of achieving this in Stata? And if so where I can learn more about doing this?

    Kind regards,

    B

  • #2
    Hi Bern,

    To create the indicator you probably want something like:

    Code:
    bysort ID: gen temp = 1 if Treatment=="T"
    bysort ID: egen indicator  = min(temp)
    drop temp
    Best,
    Rhys

    Comment


    • #3
      My solution is similar in spirit to that of Rhys Williams for the first question.

      I note that your data appear to be monthly data that happen to be presented using daily dates.

      Data examples presented as code using dataex are more helpful than as presented otherwise.

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input float(Id Date Month Year) str1 Treatment
      1 360 1 1990 "D"
      1 373 2 1991 "D"
      1 386 3 1992 "T"
      1 397 2 1993 "F"
      1 410 3 1994 "D"
      2 363 4 1990 "D"
      2 385 2 1992 "D"
      2 421 2 1995 "D"
      end
      format %tm Date
      "Any" problems can be answered using indicator variables. Rhys phrased his as the minimum over 1 and missing; I prefer to go for the maximum over 0 and 1. Note that you don't need to create the indicator, as Rhys did; you can use a true-or-false expression on the fly.
      Code:
      . egen everT = max(Treatment == "T"), by(Id)
      The second question I find harder. I split it into (1) finding the first date (2) the last date within 12 months of the first date (which might be the same as the first date) (3) the treatment on that date (4) copied to all observations for the same identifier.

      Code:
      . egen first = min(Date), by(Id)
      
      . egen refdate = max(cond((Date - first) <= 12, Date, .)), by(Id)
      
      . gen last1 = Treatment if Date == refdate
      (6 missing values generated)
      
      . bysort Id (last1) : replace last1 = last1[_N] if missing(last1)
      (6 real changes made)
      
      . sort Id Date
      
      
      . format first refdate %tm
      
      . list, sepby(Id)
      
           +--------------------------------------------------------------------------+
           | Id     Date   Month   Year   Treatm~t   everT    first   refdate   last1 |
           |--------------------------------------------------------------------------|
        1. |  1   1990m1       1   1990          D       1   1990m1    1990m1       D |
        2. |  1   1991m2       2   1991          D       1   1990m1    1990m1       D |
        3. |  1   1992m3       3   1992          T       1   1990m1    1990m1       D |
        4. |  1   1993m2       2   1993          F       1   1990m1    1990m1       D |
        5. |  1   1994m3       3   1994          D       1   1990m1    1990m1       D |
           |--------------------------------------------------------------------------|
        6. |  2   1990m4       4   1990          D       0   1990m4    1990m4       D |
        7. |  2   1992m2       2   1992          D       0   1990m4    1990m4       D |
        8. |  2   1995m2       2   1995          D       0   1990m4    1990m4       D |
           +--------------------------------------------------------------------------+
      .

      As for further reading, the idea that egen is your friend won't be picked up in all introductions to Stata, but

      https://www.stata-journal.com/articl...article=dm0055

      is an attempt (perhaps futile) to cover a lot of ground in a little space without assuming too much.

      "Any" and "all" questions are handled at https://www.stata.com/support/faqs/d...ble-recording/
      Last edited by Nick Cox; 07 Apr 2021, 07:48.

      Comment


      • #4
        Thanks Nick Cox, I've learnt something new with the true/false expression on the fly!

        Thanks,
        Rhys

        Comment

        Working...
        X