Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • specify moment at which to change value after converting (tri)annual-subject to month-subject observations

    Dear Statalisters,

    My goal is to convert that subject-triannual data set to one with subject-month observations, and specify the month at which one string variable (named "strvar" below) should change value, according to the var called "exact_time".

    I have a data set with four records per subject (subject-year observations, aka multiple-record-per-subject data set), information was recorded every three years for each subject as follows:
    Subject time strvar exact_time
    1 1992 str_z April.1.1992
    1 1995 str_u November.30.1995
    1 1998 str_a January.1.1999
    2 1989 str_b January.15.1989
    2 1992 str_z June.15.1992
    2 1995 str_c August.30.1995
    "strvar" changes its value every three years. The variable "exact_time" records the exact (month.day.year) moment at which each the variable "strvar" changes its value. Once "strvar" varies, it keeps the same value for the following months, until the moment indicated by the next value of "exact_time"

    I want Stata to change the value of "strvar" according to the variable "exact_time". For instance, subject 1 changed value of "strvar" in April 1, 1992, hence, I want Stata to assign the new value of "strvar" in April 1992. The value of "strvar" for subject 1 should remain the same until "exact_time" changes value (November.30.1995), hence, starting in November 1995, subject one should adopt the new value of "strvar". In 1998, "strvar" of subject one changed value once again, this time at the beginning of next year (January.1.1999), hence, "strvar" will adopt a new value starting in January.1999, until subject one's last observation (December 2002). As follows:
    Subject new_time strvar
    1 January.1992 str_x
    1 February.1992 str_x
    1 March.1992 str_x
    1 April.1992 str_z
    1 May.1992 str_z
    1 June.1992 str_z
    1 ... str_z
    1 ... str_z
    1 October.1995 str_z
    1 November.1995 str_u
    1 December.1995 str_u
    1 . str_u
    1 . str_u
    1 November.1998 str_u
    1 December.1998 str_u
    1 January.1999 str_a
    1 February.1999 str_a
    1 ... str_a
    1 ... str_a
    1 December 2002 str_a
    2 January.1989 str_z
    2 February.1989 str_b
    2 March.1989 str_b
    2 ... str_b
    2 December.2002 str_c
    I believe this can be achieved in in two steps, the second of which I need your support with:

    1. Expand each tri-annual observation 36 times, so as to have monthly-subject observations, i.e., generate var "new_time". I guess this can be achieved through:
    Code:
     expandcl 36, generate(new_time) cluster(subject)
    2. Instruct Stata to change the value "strvar" according to the date specified by "exact_time", which I have no idea how to do, and for which I would appreciate your support.

    Thank you in advance!
    Last edited by Victor Cruz; 23 Nov 2015, 09:16.
    Victor Cruz

  • #2
    You'll need to provide a little more information. In your desired result, you show months January, 1992 through March, 1992 associated with strvar == str_x. But str_x appears out of the blue, it appears nowhere in the first table. Please clarify that, and more generally, how to handle the situation where new_time is earlier than the earliest exact_time for that subject.

    Comment


    • #3
      Cross-posted here http://stackoverflow.com/questions/3...ual-subject-to

      Please note our policy on cross-posting, which is that you should tell us about it. http://www.statalist.org/forums/help#crossposting

      Comment


      • #4
        Originally posted by Clyde Schechter View Post
        You'll need to provide a little more information. In your desired result, you show months January, 1992 through March, 1992 associated with strvar == str_x. But str_x appears out of the blue, it appears nowhere in the first table. Please clarify that, and more generally, how to handle the situation where new_time is earlier than the earliest exact_time for that subject.
        thanks for the suggestion, Clyde (if I may).

        The original data set has information on one variable called "strvar" for 32 subjects, between 1986 - 2015 recorded every three years (triannual-subject observations), and my goal is to end up with a data set with month-subject observations for the period January.1990-December.2014. The variable "exact_time" indicates the exact date in which "strvar" changed value, and this date will indicate in the new data set (with month-subject observations) the month in which "strvar"will change value.

        Not all subjects changed value of "strvar" in the same year, but all of them changed every three years; e.g., some subjects changed value in years 1986, 1989, 1992 and so on every three years; others in 1987, 1990, 1993 and so on every three years; and others in 1988, 1991, 1994 and so on every three years until 2015.The first value of "strvar" (for January 1990) will be the last one recorded before 1990.
        For instance, the latest value "strvar" before 1990 for subject one is "str_x" (corresponding to 1989, not included in the table below), hence, this variable in the new data set should have for subject one the value "str_x" from January.1990 until March.1992 and the value "str_z" starting in April.1992, because the value of "strvar" for this subject changed as expressed by "exact_time". The value of "strvar" for subject one will remain constant for the next three years, and will change value (to "str_u") in November.1995, as expressed by the subsequent value of "exact_time" (November.30.1995).
        Subject time strvar exact_time
        1 1992 str_z April.1.1992
        1 1995 str_u November.30.1995
        1 1998 str_a January.1.1999
        2 1989 str_b January.15.1989
        2 1992 str_z June.15.1992
        2 1995 str_c August.30.1995
        for the second step that I propose in my post above, maybe
        Code:
        carryforward
        and its
        Code:
        dynamic_condition()
        option could do the work? if so, I would appreciate suggestions.

        Cross-posted here
        Victor Cruz

        Comment


        • #5
          I am not familiar with the -carryforward- command, so I can't comment on its applicability here.

          In light of your clarifications, I have taken the liberty of adding some made-up observations to your table so that both subjects 1 and 2 have entries in 1986 and 1989. With the data so expanded, the following code works for your example data.

          Code:
          clear*
          
          // PREPARE TWO DATA FILES
          input byte subject int time str5 strvar int exact_time
          1 1986 "str_w"  9648
          1 1989 "str_x" 10651
          1 1992 "str_z" 11779
          1 1995 "str_u" 13117
          1 1998 "str_a" 14245
          2 1986 "str_m"  9802
          2 1989 "str_b" 10607
          2 1992 "str_z" 11854
          2 1995 "str_c" 13025
          end
          format %tdMonth.dd.CCYY exact_time
          list, noobs clean
          tempfile file1
          save `file1'
          
          
          clear
          input byte subject str5 strvar float new_time
          1 "str_x" 384
          1 "str_x" 385
          1 "str_x" 386
          1 "str_z" 387
          1 "str_z" 388
          1 "str_z" 389
          1 "str_z"   .
          1 "str_z"   .
          1 "str_z" 429
          1 "str_u" 430
          1 "str_u" 431
          1 "str_u"   .
          1 "str_u"   .
          1 "str_u" 466
          1 "str_u" 467
          1 "str_a" 468
          1 "str_a" 469
          1 "str_a"   .
          1 "str_a"   .
          1 "str_a" 515
          2 "str_z" 348
          2 "str_b" 349
          2 "str_b" 350
          2 "str_b"   .
          2 "str_c" 515
          end
          format %tmMonth.CCYY new_time
          list, noobs clean
          tempfile file2
          save `file2'
          
          // COMBINE THEM TO PRODUCE THE DESIRED RESULT
          use `file1', clear
          // CRATE VARIABLES exact_month AND next_month TO 
          // DEFINE THE RANGE OF MONTHS [exact_month, next_month) 
          // OVER WHICH strvar APPLIES.
          gen int exact_month = mofd(exact_time)
          keep subject strvar exact_month
          by subject (exact_month), sort: gen int next_month = exact_month[_n+1]
          format %tmMonth.CCYY exact_month next_month 
          rename strvar correct_strvar
          // BRING IN THE MONTHLY DATA FILE
          joinby subject using `file2'
          // AND KEEP ONLY THOSE OBSERVATIONS WHERE
          // new_time IS IN THE RANGE OF APPLICABILITY OF strvar
          keep if inrange(new_time, exact_month, next_month-1)
          // GET RID OF THE ORIGINAL strvar AND USE INSTEAD
          // WITH THE CORRECT ONE
          drop strvar
          rename correct_strvar strvar
          // CLEAN UP AND REORDER THE DATA
          // FOR EASE OF READABILITY
          drop exact_month next_month
          sort subject new_time
          list, noobs clean
          Notes: 1. This solution starts with the table of exact times and the table of monthly data in separate data files (which in the above code I have made temporary files, but you can use permanent files by just removing the `' from the code.
          2. Any viable solution, I think, will require that the dates all be appropriate Stata numeric dates, not human-readable strings (though they can be display-formatted to be human readable as well. If your data does not already have them in that format, use the daily() and monthly() functions to convert them.

          Comment


          • #6
            thank you very much for this, Mr. Schechter.

            I would still appreciate suggestions with regard to how to prepare the 2nd data set that you propose (the one with subject-month observations), given that I only have the first data set (with triannual-subject observations). I should have clarified that the cells containing "..." in table 2 refer not to missing values, but instead represent the subject-months during which "strvar" did not change.
            What I am aiming at is to expand the original triannual-subject data set to a data set with month-subject obs. for all subjects (32 in total) between January.1990 until December.2014. Below a portion of the data set that I would like to obtain for subject 1, from January 1991 until January 1996, during which "strvar" changes value twice (April 1992 and November 1995, as indicated by "exact_time" from table 1).
            subject Year Month strvar

            1 1991 January str_x

            1 1991 February str_x
            1 1991 March str_x
            1 1991 April str_x

            1 1991 May str_x
            1 1991 June str_x
            1 1991 July str_x
            1 1991 August str_x
            1 1991 September str_x
            1 1991 October str_x
            1 1991 November str_x
            1 1991 December str_x
            1 1992 January str_x
            1 1992 February str_x
            1 1992 March str_x
            1 1992 April str_z
            1 1992 May str_z
            1 1992 June str_z
            1 1992 July str_z
            1 1992 August str_z
            1 1992 September str_z
            1 1992 October str_z
            1 1992 November str_z
            1 1992 December str_z
            1 1993 January str_z
            1 1993 February str_z
            1 1993 March str_z
            1 1993 April str_z
            1 1993 May str_z
            1 1993 June str_z
            1 1993 July str_z
            1 1993 August str_z
            1 1993 September str_z
            1 1993 October str_z
            1 1993 November str_z
            1 1993 December str_z
            1 1994 January str_z
            1 1994 February str_z
            1 1994 March str_z
            1 1994 April str_z
            1 1994 May str_z
            1 1994 June str_z
            1 1994 July str_z
            1 1994 August str_z
            1 1994 September str_z
            1 1994 October str_z
            1 1994 November str_z
            1 1994 December str_z
            1 1995 January str_z
            1 1995 February str_z
            1 1995 March str_z
            1 1995 April str_z
            1 1995 May str_z
            1 1995 June str_z
            1 1995 July str_z
            1 1995 August str_z
            1 1995 September str_z
            1 1995 October str_z
            1 1995 November str_u

            1 1995 December str_u
            1 1996 January str_u
            Thank you in advance!
            Victor Cruz

            Comment


            • #7
              So the whole thing beginning to end is:
              Code:
              clear*
              
              // READ IN FILE1 & SAVE IT
              input byte subject int time str5 strvar int exact_time
              1 1986 "str_w"  9648
              1 1989 "str_x" 10651
              1 1992 "str_z" 11779
              1 1995 "str_u" 13117
              1 1998 "str_a" 14245
              2 1986 "str_m"  9802
              2 1989 "str_b" 10607
              2 1992 "str_z" 11854
              2 1995 "str_c" 13025
              end
              format %tdMonth.dd.CCYY exact_time
              list, noobs clean
              tempfile file1
              save `file1'
              
              // CREATE FILE2 WITH MONTHLY OBSERVATIONS 1/1990-12/2014
              sort subject time
              keep if inrange(time, 1990, 2014)
              // CHANGE TIME TO MONTHLY TIMES
              replace time = mofd(mdy(1, 1, time))
              format time %tmMonth.CCYY
              sort subject time
              expand 3 in 1 // CREATE TWO "BOUNDARY OBSERVATIONS"
              replace time = tm(1990m1) in -2 
              replace time = tm(2014m12) in -1 
              duplicates drop subject time, force // ELIMINATE ANY REDUNDANCIES
              tsset subject time
              // NOW FILL IN ALL MISSING MONTHS
              tsfill, full
              rename time new_time
              tempfile file2
              save `file2'
              
              // NOW USE THE SAME CODE AS PREVIOUSLY TO COMBINE THEM
              use `file1', clear
              gen int exact_month = mofd(exact_time)
              keep subject strvar exact_month
              by subject (exact_month), sort: gen int next_month = exact_month[_n+1]
              format %tmMonth.CCYY exact_month next_month 
              rename strvar correct_strvar
              list, noobs clean
              joinby subject using `file2'
              list, noobs clean
              keep if inrange(new_time, exact_month, next_month-1)
              drop strvar
              rename correct_strvar strvar
              drop exact_month next_month exact_time
              sort subject new_time
              list, noobs clean
              Presumably, your file1 already exists as a real file, so you don't have to create that part, and you can replace `file1' by the name of that file throughout.

              By the way, if your real name is Victor Cruz (since the norm in this community is to use your real first and last name as your user name), why do you sign off as Victor D?

              Comment


              • #8
                Thank you very much for this detailed and fast reply Prof. Clyde Schechter! It did resolve my issue!
                Signature updated, too.
                Victor Cruz

                Comment

                Working...
                X