Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculate Difference between Last and First Observation in Time Series Data

    Hello Statalist members,

    I want to calculate the difference between the last and first observations in my time series data. I am using Stata 16 for Mac. I have 25 years of data with one observation for each year for 2167 individuals which are nested within 55 groups. I ultimately want to create percent change over time. But first wanted to create a variable that calculates 2014 observation - 1990 observation.

    I first tried the following code but it was definitely not the correct syntax.

    gen = ((var if year==2014) - (var if year==1990))


    Then I tried to use xtset id year and the times series operators but I believe this is only the difference between the last and previous observation ie. 1991-1990 or 1992-1991 or 1993-1992.

    gen diff = D.var
    label variable diff "Difference: 2014-1990

    gen change = D.`var' / L.`var'
    label variable change "Change: 2014-1990/1990"

    Please share how to create a new variable that is the difference between the last and first observations.

    Thank you for your time and insights.
    Be well and safe.

  • #2

    Code:
    bysort id (year) : gen change = var[_N] - var[1]
    will work if the first and last values are both non-missing. Otherwise segregating missings is recommended:

    Code:
    gen missingvar = missing(var)
    bysort missingvar id (year) : gen change = var[_N] - var[1]
    bysort id (change) : replace change = change[1]
    Here is a (sketch of a) systematic way to get first and last non-missing values for panels in your set-up. See more at https://www.stata-journal.com/articl...article=dm0055 (Section 9 and also Section 10).

    Code:
    egen when_first = min(cond(!missing(var), year, .)), by(id)
    egen when_last = max(cond(!missing(var), year, .)), by(id)
    egen first = mean(cond(year == when_first, var, .)), by(id)
    egen last = mean(cond(year == when_last, var, .)), by(id)

    Comment


    • #3
      Dear Nick,
      Thank you for your quick and helpful response. It worked great!
      Just wondering if there is a way to do this using the time series operators? But my guess is that you would have provided that code if it was possible.

      Wishing you and yours healthy and safe,
      Brennan

      Comment


      • #4
        I can't see how time series operators can possibly help here, but I am open to all smart ideas as always.

        Comment


        • #5
          Hello,

          I'm wondering how to create a variable that calculates the difference between a last available occurrence and a first available occurrence. I want to ignore any observations that only have one available occurrence (either first or last) and is missing the other (either first or last) as this seems to use the single occurrence and calculate the difference between itself making it zero instead of missing.

          This is the code I previously attempted to use:

          generate first_everythingeffort = .

          generate last_everythingeffort = .

          bysort consumerid (interviewdate): replace first_everythingeffort = cond(first_everythingeffort[_n-1]!=.,first_everythingeffort[_n-1],everythingeffort)
          bysort consumerid (interviewdate): replace last_everythingeffort = cond(everythingeffort!=.,everythingeffort,last_eve rythingeffort[_n-1])
          bysort consumerid: generate diff_everythingeffort = last_everythingeffort[_N]-first_everythingeffort[_N]

          Any help is greatly appreciated.

          Thank you!

          Comment


          • #6
            Your code is very confusing: you create variables first_everythingeffort and last_everythingeffort as all missing values, and then perform various calculations with those missing values. The end result can be nothing but missing values!

            You do not show example data, and it is difficult to imagine what your data actually look like. I have constructed a toy data set that might resemble yours to illustrate the approach. It has three variables: a consumerid, a sequence number to provide a first to last ordering, and a variable called effort, which is just a random number here. The code after the data show how to do it:
            Code:
            * Example generated by -dataex-. For more info, type help dataex
            clear
            input int consumerid byte seq float effort
             1 1     .
             1 2 22693
             1 3     .
             1 4     .
             1 5 22769
             2 1     .
             2 2 22655
             2 3 22663
             2 4     .
             2 5 22815
             3 1 22697
             3 2     .
             3 3     .
             3 4     .
             3 5     .
             4 1     .
             4 2     .
             4 3 22649
             4 4 22793
             4 5     .
             5 1     .
             5 2 22748
             5 3 22737
             5 4     .
             5 5 22703
             6 1 22729
             6 2 22735
             6 3     .
             6 4     .
             6 5     .
             7 1 22825
             7 2     .
             7 3 22686
             7 4     .
             7 5     .
             8 1     .
             8 2 22698
             8 3 22796
             8 4     .
             8 5 22812
             9 1     .
             9 2     .
             9 3 22664
             9 4     .
             9 5 22661
            10 1 22748
            10 2 22717
            10 3     .
            10 4     .
            10 5 22728
            end
            
            
            gen byte no_effort = missing(effort)
            by consumerid no_effort (seq), sort: ///
                gen diff_effort = effort[_N] - effort[1] ///
                if !no_effort & _N > 1
            by consumerid (diff_effort seq), sort: ///
                replace diff_effort = diff_effort[1]
            sort consumerid seq
            In the future, when asking for help with code always show example data, and please use the -dataex- command to do so, as I have here. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

            When asking for help with code, always show example data. When showing example data, always use -dataex-.


            Comment


            • #7
              Hi Clyde,

              Thank you so much for your help! I am relatively new to Stata and greatly appreciate the insight. I've used dataex to provide an example of the data I'm working with. The code I used provided the difference between the last available occurrence and the first available occurrence; however, it also calculated the difference for consumers who had only one assessment (assess_N) available and a value for overallhealth, so it subtracted it from itself resulting in 0 instead of missing. Also, if a consumer had several assessments but only provided a response for the first assessment (or any one assessment), the code subtracted the value from itself resulting in 0 as well. I want the code to "ignore" these two cases and set them as missing.

              Code:
              * Example generated by -dataex-. For more info, type help dataex
              clear
              input str8 consumerid float(assess_N assess_order) int(interviewdate ffy) byte overallhealth float(first_ovhealth last_ovhealth diff_ovhealth change_ovhealth2) byte no_ovhealth float diff_ovhealthT
              "1037932"  1 1 22075 2020 . . .  . . 1  .
              "1090024"  5 1 22032 2020 3 3 3  0 2 0  0
              "1090024"  5 3 22182 2021 3 3 3  0 2 0  0
              "1090024"  5 2 22182 2021 3 3 3  0 2 0  0
              "1090024"  5 4 22347 2021 3 3 3  0 2 0  0
              "1090024"  5 5     . 2022 . 3 3  0 2 1  0
              "1100014"  3 1 22088 2020 . . .  0 2 1  0
              "1100014"  3 2 22249 2021 2 2 2  0 2 0  0
              "1100014"  3 3 22249 2021 2 2 2  0 2 0  0
              "1328106"  2 1 22046 2020 . . .  . . 1  .
              "1328106"  2 2     . 2021 . . .  . . 1  .
              "1452864"  6 1 21865 2020 2 2 2  0 2 0  0
              "1452864"  6 2 22012 2020 2 2 2  0 2 0  0
              "1452864"  6 4 22195 2021 3 2 3  0 2 0  0
              "1452864"  6 3 22195 2021 3 2 3  0 2 0  0
              "1452864"  6 5 22475 2021 2 2 2  0 2 0  0
              "1452864"  6 6 22566 2022 2 2 2  0 2 0  0
              "1539754"  1 1 21900 2020 1 1 1  0 2 0  .
              "1539936"  5 1 22025 2020 4 4 4  0 2 0  0
              "1539936"  5 2 22209 2021 4 4 4  0 2 0  0
              "1539936"  5 3 22209 2021 4 4 4  0 2 0  0
              "1539936"  5 4 22398 2021 2 4 2  0 2 0  0
              "1539936"  5 5 22459 2021 4 4 4  0 2 0  0
              "1542742"  5 1 21952 2020 4 4 4 -2 1 0 -2
              "1542742"  5 2 22176 2020 3 4 3 -2 1 0 -2
              "1542742"  5 3 22358 2021 2 4 2 -2 1 0 -2
              "1542742"  5 4 22595 2021 . 4 2 -2 1 1 -2
              "1542742"  5 5 22624 2022 2 4 2 -2 1 0 -2
              "1554548"  4 1 22090 2020 3 3 3 -1 1 0 -1
              "1554548"  4 3 22279 2021 2 3 2 -1 1 0 -1
              "1554548"  4 2 22279 2021 2 3 2 -1 1 0 -1
              "1554548"  4 4     . 2022 . 3 2 -1 1 1 -1
              "1555114"  3 1 22279 2021 4 4 4 -1 1 0 -1
              "1555114"  3 2 22396 2021 . 4 4 -1 1 1 -1
              "1555114"  3 3 22581 2022 3 4 3 -1 1 0 -1
              "1598490"  1 1     . 2021 . . .  . . 1  .
              "1604060"  2 1 22399 2021 . . .  . . 1  .
              "1604060"  2 2     . 2022 . . .  . . 1  .
              "a281946a" 2 2     . 2022 . . .  . . 1  .
              "a281946a" 2 1     . 2020 . . .  . . 1  .
              "a576034a" 1 1     . 2020 . . .  . . 1  .
              "a576034j" 1 1     . 2021 . . .  . . 1  .
              "a579232l" 1 1 22428 2021 3 3 3  . . 0  .
              "a580028a" 1 1     . 2019 . . .  . . 1  .
              "a890753a" 1 1 22049 2020 5 5 5  . . 0  .
              "a890753p" 3 1 22362 2021 5 5 5  0 2 0  0
              "a890753p" 3 2 22474 2021 5 5 5  0 2 0  0
              "a890753p" 3 3 22594 2022 5 5 5  0 2 0  0
              "b212734r" 4 1 21876 2020 4 4 4  0 2 0  0
              "b212734r" 4 2 22048 2020 4 4 4  0 2 0  0
              "b212734r" 4 4 22286 2021 4 4 4  0 2 0  0
              "b212734r" 4 3 22286 2021 4 4 4  0 2 0  0
              "b267147d" 1 1 21797 2019 . . .  . . 1  .
              "b409846d" 2 1 21944 2020 . . .  . . 1  .
              "b409846d" 2 2     . 2020 . . .  . . 1  .
              "b698286a" 4 1 22042 2020 5 5 5 -1 1 0 -1
              "b698286a" 4 2 22334 2021 4 5 4 -1 1 0 -1
              "b698286a" 4 3 22362 2021 4 5 4 -1 1 0 -1
              "b698286a" 4 4 22530 2022 4 5 4 -1 1 0 -1
              "b843928h" 5 2     . 2021 . . .  . . 1  .
              "b843928h" 5 1     . 2021 . . .  . . 1  .
              "b843928h" 5 3     . 2022 . . .  . . 1  .
              "b843928h" 5 4     . 2020 . . .  . . 1  .
              "b843928h" 5 5     . 2021 . . .  . . 1  .
              "c292209k" 2 1 22322 2021 3 3 3 -1 1 0 -1
              "c292209k" 2 2 22599 2021 2 3 2 -1 1 0 -1
              "c409187g" 1 1 22614 2022 3 3 3  . . 0  .
              "c413499z" 1 1     . 2019 . . .  . . 1  .
              "c418998d" 3 3     . 2022 . . .  . . 1  .
              "c418998d" 3 1     . 2021 . . .  . . 1  .
              "c418998d" 3 2     . 2021 . . .  . . 1  .
              "c428523f" 1 1 21978 2020 . . .  . . 1  .
              "c810877a" 1 1 21963 2020 2 2 2  . . 0  .
              "d155383j" 4 1 22026 2020 . . .  . . 1  .
              "d155383j" 4 2 22194 2020 . . .  . . 1  .
              "d155383j" 4 3 22356 2021 . . .  . . 1  .
              "d155383j" 4 4     . 2019 . . .  . . 1  .
              "d411213c" 4 1 21950 2020 2 2 2  0 2 0  0
              "d411213c" 4 2 22047 2020 5 2 5  0 2 0  0
              "d411213c" 4 3 22559 2021 . 2 5  0 2 1  0
              "d411213c" 4 4 22560 2022 2 2 2  0 2 0  0
              "e187576m" 1 1 22473 2021 4 4 4  . . 0  .
              "e674186s" 3 1 21963 2020 1 1 1  0 2 0  0
              "e674186s" 3 2 22341 2021 1 1 1  0 2 0  0
              "e674186s" 3 3 22525 2021 . 1 1  0 2 1  0
              "e800689y" 1 1     . 2020 . . .  . . 1  .
              "e902343i" 2 1     . 2020 . . .  . . 1  .
              "e902343i" 2 2     . 2021 . . .  . . 1  .
              "f875154g" 1 1     . 2019 . . .  . . 1  .
              "g164414c" 2 1 22351 2021 4 4 4  . . 0  .
              "g164414c" 2 2 22556 2021 . 4 4  . . 1  .
              "g355857i" 5 1 21938 2020 4 4 4 -2 1 0 -2
              "g355857i" 5 2 22180 2020 4 4 4 -2 1 0 -2
              "g355857i" 5 3 22354 2021 2 4 2 -2 1 0 -2
              "g355857i" 5 4 22551 2021 2 4 2 -2 1 0 -2
              "g355857i" 5 5     . 2022 . 4 2 -2 1 1 -2
              "g487986i" 2 1 21826 2020 . . .  . . 1  .
              "g487986i" 2 2 21857 2020 . . .  . . 1  .
              "g764560e" 1 1     . 2019 . . .  . . 1  .
              "g847888b" 1 1     . 2022 . . .  . . 1  .
              end
              format %tdnn/dd/CCYY interviewdate
              label values change_ovhealth2 change_ovhealth2
              label def change_ovhealth2 1 "Improved", modify
              label def change_ovhealth2 2 "Maintained", modify
              label var consumerid "consumerid" 
              label var interviewdate "interviewdate" 
              label var ffy "ffy" 
              label var overallhealth "overallhealth"
              Again, thank you so much!

              Comment


              • #8
                Thanks for the example data. It looks like you successfully adapted my code to it, and it does what you wanted. Seeing how the data is organized, it's really quite parallel to my demonstration data set, and I can't think of any way to tweak the code I proposed to make it better.

                Comment


                • #9
                  Great, thank you so much for verifying!

                  Comment

                  Working...
                  X