Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Need help re-arranging data

    Hi I am new to STATA and I need help a bit urgently because of a deadline
    I have a list of labs taken at different time intervals for 5 years and I want to make this into a 3 monthly followup data. Please help me out how to do this in STATA.
    ID RESULT VALUE RESULT DATE
    1 4 7/14/2010
    1 6 6/09/2011
    1 6 9/09/2011
    1 9 7/04/2012
    2 3 3/11/2012
    2 2 6/12/2012
    2 6 10/12/2013
    Thanks.


  • #2
    Well, it appears from your example that the actual follow-up occurs at irregular intervals. So if you start dividing time into three month intervals, what do you want to do with intervals that contain more than one result, or none at all? Also, how do you want to define the three-month periods? Do we start at a certain date, maybe January 1, 2010? Or do we start each ID's first period at that ID's first date and move forward three months at a time from there? Or something else?

    In the future, when showing data examples, please use the -dataex- command to do so. If you are running version 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Comment


    • #3
      Hi I am sorry I am really new to stata. I have already divided into time periods looking at some examples.
      It looks like this:
      Patient ID Result at 3mo Result at 6mo Result at 9 mo
      1 x
      1 y
      1 z
      1
      and I was hoping it looks like this:
      Patient ID Result at 3mo Result at 6 mo Result at 9 mo
      1 x y z
      Thanks. I hope it makes sense

      Comment


      • #4
        The start period depends on each individual first date and then move forward 3 months. And also there are multiple entries for each time intervals and I want to keep only observation in each time period so if you can help me with that too it will be great!!

        Thanks

        Comment


        • #5
          The start period depends on each individual first date and then move forward 3 months. And also there are multiple entries for each time intervals and I want to keep only observation in each time period so if you can help me with that too it will be great!!

          Thanks

          Comment


          • #6
            And also there are multiple entries for each time intervals and I want to keep only observation in each time period so if you can help me with that too it will be great!!
            I don't understand this. Do you mean yhou want to keep only one observation in each time period? If so, which one?

            Comment


            • #7
              I really dont have any preference whichever appears first or whichever STATA picks up first. But honestly, can you help me re arrange the data first if there is a code for it can you please let me know!!

              Thanks Clyde Schechter

              Comment


              • #8
                It isn't clear to me that you actually have a Stata data set in hand, since your tableaux in #1 and #3 represent illegal Stata variable names. I'm going to assume that you do have one and that it looks like what the following code creates:
                Code:
                * Example generated by -dataex-. To install: ssc install dataex
                clear
                input byte(id resultvalue) float date
                1 4 18457
                1 6 18787
                1 6 18879
                1 9 19178
                2 3 19063
                2 2 19156
                2 6 19643
                end
                format %td date
                If your data set does not actually look like that, then you need to first get it to look like that, i.e. with a real Stata numerical date variable.

                From that starting point, you can get what you want with:
                Code:
                by id (date), sort: gen follow_num = 3*floor((date-date[1])/(365/4))
                collapse (first) resultvalue, by(id follow_num)
                rename resultvalue result__mos
                reshape wide result_@_mos, i(id) j(follow_num)
                I really dont have any preference whichever appears first or whichever STATA picks up first.
                Seriously? If you don't have any preference about your data, why bother with data analysis at all? I really consider that attitude irresponsible. The code shown above picks the chronologically first.

                Finally, I will just add that you are likely to come to regret re-organizing your data in this way. While there are a few things that are best done with the wide layout you have asked for, the vast majority of data management and analysis commands in Stata work better (or only at all) with the data in the long layout you are starting with. Have you thought through where you are going with this?



                Comment


                • #9
                  I'm sorry if I sounded irresponsible but what I meant was within the 3 month time period I don't really have a preference if there are multiple observations, picking one random observation is fine
                  And my dates do look like this . In fact I don't know if it helps but I also created a new variable followup_time which calculated the days from the start date because I was hoping to separate 3 month time period using that.
                  Also If it's not too much to ask for can you explain me what this code will do exactly so that I also understand it better to do it next time!!

                  Thank you so much for your help!!

                  Comment


                  • #10
                    Code:
                    by id (date), sort: gen follow_num = 3*floor((date-date[1])/(365/4))
                    creates a variable whose value is an integer multiple of 3, representing the follow-up period. 0 is the period from 0 (first date) to 3 months, 3 is from 3 to 6 months, 6 from 6 to 9 months, etc. It is calculated by first getting the number of days from the observation's date to the same id's first date and then dividing that by the number of days in 3 months (365/4) and then truncating that to an integer.
                    Code:
                    collapse (first) resultvalue, by(id follow_num)
                    aggregates up the data to a single observation in each time period for each id. This one chooses the first it encounters, but because the preceding command sorted the data in chronological order within id, that means the earliest.
                    Code:
                    rename resultvalue result__mos
                    This changes the name of the variable resultvalue to something that the next command can change to the kind of variable names you are looking for.
                    Code:
                    reshape wide result_@_mos, i(id) j(follow_num)
                    This rearranges the data from long to wide layout, creating the new time-period-specific result value variables, numbered appropriately.

                    I suggest you step back from what you are doing and invest some time in reading the Getting Started [GS] and User's Guide [U] segments of the PDF documentation that comes with your Stata. It will introduce you to the most basic Stata commands that are used in data management and analylsis. They are the "bread and butter" commands. You won't remember every detail, but with this exposure under your belt, you will be able to solve most day-to-day data management problems in Stata, perhaps referring to -help files- or the manual chapters on specific commands to clarify some details.

                    Comment


                    • #11
                      Clyde Schechter
                      Thank you so much helping out.
                      When I tried to run the code, its showing me this error:

                      "Your data are currently long. You are performing a reshape wide. You specified i(mrn07)
                      and j(follow_num). There are observations within i(mrn07) with the same value of
                      j(follow_num). In the long data, variables i() and j() together must uniquely identify the
                      observations."

                      Can you please help me out as to what is the best way to deal with this error

                      Comment


                      • #12
                        I have to say that it is hard for me to imagine how this could have happened. I wonder if you made a mistake when you did the -collapse- command, since it should leave behind a data set with only one observation per combination of your id variable and the follow_num variable. Did you perhaps get that one wrong and use some variable other than mrn07 as your id variable in that command?

                        If that is not the source of your current difficulty, please use the -dataex- command to provide an example data set that illustrates this problem and I will try to troubleshoot it. (See #2 for information about -dataex-.)

                        Comment


                        • #13
                          I will try that again again
                          Is it possible that it didnt work because some patients have two different start dates?
                          If so then how do I get STATA to keep the ones with the latest start dates?

                          Comment


                          • #14
                            Clyde Schechter It worked this time but it just shows this table of ID and monthly followup results but all other variables age gender are no longer in the table.
                            Why did that happen?

                            Comment


                            • #15
                              The -collapse- command eliminates all variables that are not mentioned in it. As you said nothing about any other variables in your original question, I did not tailor the code to do anything with them.

                              For gender, the simplest solution is to modify the -collapse- command. Since sex doesn't change over time you can just pick the first value--it will be the same as all the others:

                              Code:
                              collapse (first) gender result_value, by(id_variable follow_num)
                              Age is not so simple because it is going to change over time, so it does not lend it self nicely to a one-observation-per-id framework. (This may be another reason why you should reconsider doing this transformation in the first place.) So you need to decide whether you want to pick some particular value of age, e.g. the first, or the average, or the oldest, or.... Alternatively, you could create new variables for age at 0 months, age at 3 months, age at 6 months, etc. by including it in the -by()- option of the -collapse- command and then adding it to the -reshape- command alongside result_@_mos. It all depends on how you plan to use the information.

                              Comment

                              Working...
                              X