Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • tagging observations in a longitudinal dataset

    Hi

    I'm working with a longitudinal dataset that follows a set of children from date of diagnosis onward. I just realized that several subjects are lacking data during the first few months post-diagnosis. I want to create a variable 'D12' that groups children into two groups, D12==1 if they have data that starts anytime before the thirteenth month before diagnosis and D12==2 for subjects who have data that only starts after the twelfth month (and hence not included in the analysis).

    This variable will have to based on duration (months):

    Code:
    gen D12 = duration
    replace D12 = 1 if duration<12.1 
    replace D12 = 2 if
    - How do I specify category 2 above? This group has subjects with data on variables only from the thirteenth month onward. Category 1 subjects can have data pre- and post the twelfth month.

    Thanks

    /Amal

  • #2
    You didn't get a quick answer. You would increase your chances of a useful answer if you include Stata code in code delimiters, Stata output, and sample data using dataex - see FAQ on asking questions. Also, simplify what you post to the minimum needed to demonstrate the problem.

    We can't really answer your question without a little more understanding of your data structure. We would need to understand how duration is calculated. I don't see how duration alone would necessarily let you make the distinction between early and late.

    One thing that might help is setting this up properly as a panel data set. Do you have a separate identifier for each child that indicates months? If you have month and child identifiers, then you can use lags and iff statements to make almost any permutation of inclusion you want. Altneratively, something like:
    bysort child: egen early=mean(x) if month<13

    early will then only have a non-zero, non-missing value when you have something in x before month 13.

    Comment


    • #3
      Thanks for the feedback but yes I did include the little code I had above using code delimiters. I don't have much code for this particular problem as I'm not sure how to write the correct Stata syntax!

      This is a longitudinal dataset, with each subject identified by an ID variable. The dataset is in the 'long format'. Duration is created by subtracting date of diagnosis from date of clinic visit. All subjects have a duration value:

      Code:
      ID duration X1 X2 D12 
      1   0       21  5  1
      1   1       22  6  1
      1   2       23  7  1
      1   3       24  8  1
      1   4       26  9  1
      1   5       27  9  1
      2   0       21  5  1
      2   1       22  6  1
      2   2       23  7  1
      2   3       24  8  1
      2   4       23  7  1
      2   5       24  8  1
      3   9       26  9  2
      3   10      27  9  2
      Above is just an example dataset - subjects 1 and 2 have data from duration 0 onward and will thus be included in the dataset (let's just say the analysis will only include those subjects with data from time zero onwards). They are then assigned a value 1 in the new variable D12. Subject 3 on the other hand, only has data from the 9th month and will be excluded from the analysis and thus is assigned a value of 2 on the D12 variable (i.e. missing data data at time point zero).

      Hope this is more clear!

      Thanks

      /Amal

      Comment


      • #4
        (let's just say the analysis will only include those subjects with data from time zero onwards
        I'll interpret this as meaning that the first observation available has duration = 0. It isn't clear to me if that's what you mean. What, for example, would happen if you had a subject with data at duration 0, but then the next observation for that subject isn't until duration 10? Anyway, the following code will set D12 = 1 for all observations for a given ID if the earliest observation has duration = 0.

        Code:
        by ID (duration), sort: gen byte D12 = cond(duration[1] == 0, 1, 2)

        Comment

        Working...
        X