Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • calculate change in a variable between 2 time points using all possible time intervals (not just adjacent time points)

    I have a longitudinal dataset with variables collected irregularly over time. Each individual has 1 to 10 visits reflecting followup from 0 to 14 yrs. For each individual, I would like to calculate the 'change in a variable' over all possible 1 year time intervals--i.e. I would like to capture all possible 1-year changes in body weight for example. While I can easily program changes from adjacent visits, using subscripting (i.e. weight[_n+1] - weight[_n]), I guess that I will need two loops and matrices to capture data for all possible 1 yr. intervals (not just adjacent ones) for each individual. Unfortunately, I am a beginning programmer--does anyone have any advice for tackling this problem?
    Thank you.

  • #2
    I'd recommend the following:

    1. Search previous postings on this forum for the phrase "Welcome to Statalist," which will link you to a few posts with advice for newcomers about how to ask a question in a way more likely to get answered, as well as with some suggestions regarding local etiquette.

    2. You need to give us a sense of how your data is structured to get a good answer. Also, I can think of several interpretations of "all possible 1-year changes." Give us example data for a few individuals of how your data looks now (i.e., the results of a -list- command) and how you would like it to look with the new "all possible differences" variable(s). You'll see some good examples in previous postings in this forum where people have done that.


    Regards, Mike

    Comment


    • #3
      Look up commands like tsfill and ipolate if you need to estimate values for years in which there was no visit.
      -------------------------------------------
      Richard Williams, Notre Dame Dept of Sociology
      StataNow Version: 19.5 MP (2 processor)

      EMAIL: [email protected]
      WWW: https://www3.nd.edu/~rwilliam

      Comment


      • #4
        Please take Mike Lacy's points seriously (both of them), and act accordingly.
        From the little information that you have given us, I would suggest the following: (a) ensure your data are in long format; (b) xtset or tsset your data; (c) calculate changes using Stata's lag or lead operators. [Using (b) and (c) will ensure that gaps (missing values of your outcome variable) are treated appropriately when calculating changes over time.] The command at (c) may look something like bysort person_id (time_variable): generate change = outcome - L.outcome. You can read up about long versus wide data formats, and the other things that I refer to, by using appropriate search commands and then following the relevant links to the manual entries, or the relevant help files.

        Comment


        • #5
          Thank you very much for the replies and advice. I apologize that I wasn't clear in the original post. I've attached some sample data below--indicating that I have weight measurements on different individuals over multiple points in time. I want to capture all possible 1-year weight change intervals (not just between successive visits) for non-missing data--e.g. for id=3, I am interested in weight at vdate June2009 - weight at vdate June2008, which reflects obs 4 and 2. I may be missing something, but I don't believe that the lag function will achieve this goal. Currently, I have tsset my data and am using 2 loops to generate variables for all possible 1-year weight interval combinations for each person, then deleting the duplicate variables and collapsing the non-missing values into a new variable reflecting all 1-yr intervals per individual, but it isn't very elegant or efficient.
          Click image for larger version

Name:	example.data.jpg
Views:	1
Size:	33.7 KB
ID:	150829

          Comment


          • #6
            Maybe, just maybe, I get what you want to do. Your use of the term 1-yr interval to me would normally mean two observations that are exactly (or nearly exactly) one year apart. But based on your example, perhaps you mean any intervals that are 1 year or less? I'm also inferring from the color in your browser snapshot that vdate is not really a date variable: it's a string variable that is human-readable as a monthly date. So if I have this right, this should work (but is not tested):

            Code:
            gen int mdate = monthly(vdate, "MY")  // create a Stata monthly date variable
            format mdate %tm // optional, only if you want to read these dates yourself; it's the underlying numbers that matter
            
            // Now make a copy of the  data with a different name for the month  and weight variable
            preserve
            tempfile data_copy
            rename mdate mdate2
            rename weight weight2
            keep id mdate2 weight2
            sort id
            save `data_copy'
            
            
            // Join the original to the copy
            restore
            sort id
            joinby id using `copy'
            keep if inrange(mdate2-mdate, 1, 12) // only interested if dates are within 12 months
            /*
            If I have misunderstood you and you only want pairings that are exactly 12 months apart
            just change the last line of code to
            keep if mdate2-mdate == 12
            */
            gen delta_weight = weight2-weight

            Comment


            • #7
              thank you Clyde for the tips--I'll need to think a bit more about how to apply your code in my case (I have up to 10 obs per person).

              Comment

              Working...
              X