Calculate Difference between Last and First Observation in Time Series Data

Brennan RBratton

Join Date: Mar 2019

Posts: 43
#1

Calculate Difference between Last and First Observation in Time Series Data

07 Jun 2020, 21:19

Hello Statalist members,

I want to calculate the difference between the last and first observations in my time series data. I am using Stata 16 for Mac. I have 25 years of data with one observation for each year for 2167 individuals which are nested within 55 groups. I ultimately want to create percent change over time. But first wanted to create a variable that calculates 2014 observation - 1990 observation.

I first tried the following code but it was definitely not the correct syntax.

gen = ((var if year==2014) - (var if year==1990))

Then I tried to use xtset id year and the times series operators but I believe this is only the difference between the last and previous observation ie. 1991-1990 or 1992-1991 or 1993-1992.

gen diff = D.var
label variable diff "Difference: 2014-1990

gen change = D.`var' / L.`var'
label variable change "Change: 2014-1990/1990"

Please share how to create a new variable that is the difference between the last and first observations.

Thank you for your time and insights.
Be well and safe.
Tags: operators, Time Series

Nick Cox

Join Date: Mar 2014
Posts: 35580

07 Jun 2020, 23:11

Code:

bysort id (year) : gen change = var[_N] - var[1]

will work if the first and last values are both non-missing. Otherwise segregating missings is recommended:

Code:

gen missingvar = missing(var)
bysort missingvar id (year) : gen change = var[_N] - var[1]
bysort id (change) : replace change = change[1]

Here is a (sketch of a) systematic way to get first and last non-missing values for panels in your set-up. See more at https://www.stata-journal.com/articl...article=dm0055 (Section 9 and also Section 10).

Code:

egen when_first = min(cond(!missing(var), year, .)), by(id)
egen when_last = max(cond(!missing(var), year, .)), by(id)
egen first = mean(cond(year == when_first, var, .)), by(id)
egen last = mean(cond(year == when_last, var, .)), by(id)

Comment

Brennan RBratton

Join Date: Mar 2019

Posts: 43
#3

10 Jun 2020, 08:16

Dear Nick,
Thank you for your quick and helpful response. It worked great!
Just wondering if there is a way to do this using the time series operators? But my guess is that you would have provided that code if it was possible.

Wishing you and yours healthy and safe,
Brennan
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35580
#4

10 Jun 2020, 09:11

I can't see how time series operators can possibly help here, but I am open to all smart ideas as always.
Comment
Vanessa Garcia

Join Date: Jun 2022

Posts: 16
#5

24 Jun 2022, 16:13

Hello,

I'm wondering how to create a variable that calculates the difference between a last available occurrence and a first available occurrence. I want to ignore any observations that only have one available occurrence (either first or last) and is missing the other (either first or last) as this seems to use the single occurrence and calculate the difference between itself making it zero instead of missing.

This is the code I previously attempted to use:

generate first_everythingeffort = .

generate last_everythingeffort = .

bysort consumerid (interviewdate): replace first_everythingeffort = cond(first_everythingeffort[_n-1]!=.,first_everythingeffort[_n-1],everythingeffort)
bysort consumerid (interviewdate): replace last_everythingeffort = cond(everythingeffort!=.,everythingeffort,last_eve rythingeffort[_n-1])
bysort consumerid: generate diff_everythingeffort = last_everythingeffort[_N]-first_everythingeffort[_N]

Any help is greatly appreciated.

Thank you!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30050
#6

24 Jun 2022, 17:28

Your code is very confusing: you create variables first_everythingeffort and last_everythingeffort as all missing values, and then perform various calculations with those missing values. The end result can be nothing but missing values!

You do not show example data, and it is difficult to imagine what your data actually look like. I have constructed a toy data set that might resemble yours to illustrate the approach. It has three variables: a consumerid, a sequence number to provide a first to last ordering, and a variable called effort, which is just a random number here. The code after the data show how to do it:

Code:

* Example generated by -dataex-. For more info, type help dataex clear input int consumerid byte seq float effort 1 1 . 1 2 22693 1 3 . 1 4 . 1 5 22769 2 1 . 2 2 22655 2 3 22663 2 4 . 2 5 22815 3 1 22697 3 2 . 3 3 . 3 4 . 3 5 . 4 1 . 4 2 . 4 3 22649 4 4 22793 4 5 . 5 1 . 5 2 22748 5 3 22737 5 4 . 5 5 22703 6 1 22729 6 2 22735 6 3 . 6 4 . 6 5 . 7 1 22825 7 2 . 7 3 22686 7 4 . 7 5 . 8 1 . 8 2 22698 8 3 22796 8 4 . 8 5 22812 9 1 . 9 2 . 9 3 22664 9 4 . 9 5 22661 10 1 22748 10 2 22717 10 3 . 10 4 . 10 5 22728 end gen byte no_effort = missing(effort) by consumerid no_effort (seq), sort: /// gen diff_effort = effort[_N] - effort[1] /// if !no_effort & _N > 1 by consumerid (diff_effort seq), sort: /// replace diff_effort = diff_effort[1] sort consumerid seq

In the future, when asking for help with code always show example data, and please use the -dataex- command to do so, as I have here. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

When asking for help with code, always show example data. When showing example data, always use -dataex-.
Comment

Vanessa Garcia

Join Date: Jun 2022
Posts: 16

24 Jun 2022, 18:04

Hi Clyde,

Thank you so much for your help! I am relatively new to Stata and greatly appreciate the insight. I've used dataex to provide an example of the data I'm working with. The code I used provided the difference between the last available occurrence and the first available occurrence; however, it also calculated the difference for consumers who had only one assessment (assess_N) available and a value for overallhealth, so it subtracted it from itself resulting in 0 instead of missing. Also, if a consumer had several assessments but only provided a response for the first assessment (or any one assessment), the code subtracted the value from itself resulting in 0 as well. I want the code to "ignore" these two cases and set them as missing.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str8 consumerid float(assess_N assess_order) int(interviewdate ffy) byte overallhealth float(first_ovhealth last_ovhealth diff_ovhealth change_ovhealth2) byte no_ovhealth float diff_ovhealthT
"1037932"  1 1 22075 2020 . . .  . . 1  .
"1090024"  5 1 22032 2020 3 3 3  0 2 0  0
"1090024"  5 3 22182 2021 3 3 3  0 2 0  0
"1090024"  5 2 22182 2021 3 3 3  0 2 0  0
"1090024"  5 4 22347 2021 3 3 3  0 2 0  0
"1090024"  5 5     . 2022 . 3 3  0 2 1  0
"1100014"  3 1 22088 2020 . . .  0 2 1  0
"1100014"  3 2 22249 2021 2 2 2  0 2 0  0
"1100014"  3 3 22249 2021 2 2 2  0 2 0  0
"1328106"  2 1 22046 2020 . . .  . . 1  .
"1328106"  2 2     . 2021 . . .  . . 1  .
"1452864"  6 1 21865 2020 2 2 2  0 2 0  0
"1452864"  6 2 22012 2020 2 2 2  0 2 0  0
"1452864"  6 4 22195 2021 3 2 3  0 2 0  0
"1452864"  6 3 22195 2021 3 2 3  0 2 0  0
"1452864"  6 5 22475 2021 2 2 2  0 2 0  0
"1452864"  6 6 22566 2022 2 2 2  0 2 0  0
"1539754"  1 1 21900 2020 1 1 1  0 2 0  .
"1539936"  5 1 22025 2020 4 4 4  0 2 0  0
"1539936"  5 2 22209 2021 4 4 4  0 2 0  0
"1539936"  5 3 22209 2021 4 4 4  0 2 0  0
"1539936"  5 4 22398 2021 2 4 2  0 2 0  0
"1539936"  5 5 22459 2021 4 4 4  0 2 0  0
"1542742"  5 1 21952 2020 4 4 4 -2 1 0 -2
"1542742"  5 2 22176 2020 3 4 3 -2 1 0 -2
"1542742"  5 3 22358 2021 2 4 2 -2 1 0 -2
"1542742"  5 4 22595 2021 . 4 2 -2 1 1 -2
"1542742"  5 5 22624 2022 2 4 2 -2 1 0 -2
"1554548"  4 1 22090 2020 3 3 3 -1 1 0 -1
"1554548"  4 3 22279 2021 2 3 2 -1 1 0 -1
"1554548"  4 2 22279 2021 2 3 2 -1 1 0 -1
"1554548"  4 4     . 2022 . 3 2 -1 1 1 -1
"1555114"  3 1 22279 2021 4 4 4 -1 1 0 -1
"1555114"  3 2 22396 2021 . 4 4 -1 1 1 -1
"1555114"  3 3 22581 2022 3 4 3 -1 1 0 -1
"1598490"  1 1     . 2021 . . .  . . 1  .
"1604060"  2 1 22399 2021 . . .  . . 1  .
"1604060"  2 2     . 2022 . . .  . . 1  .
"a281946a" 2 2     . 2022 . . .  . . 1  .
"a281946a" 2 1     . 2020 . . .  . . 1  .
"a576034a" 1 1     . 2020 . . .  . . 1  .
"a576034j" 1 1     . 2021 . . .  . . 1  .
"a579232l" 1 1 22428 2021 3 3 3  . . 0  .
"a580028a" 1 1     . 2019 . . .  . . 1  .
"a890753a" 1 1 22049 2020 5 5 5  . . 0  .
"a890753p" 3 1 22362 2021 5 5 5  0 2 0  0
"a890753p" 3 2 22474 2021 5 5 5  0 2 0  0
"a890753p" 3 3 22594 2022 5 5 5  0 2 0  0
"b212734r" 4 1 21876 2020 4 4 4  0 2 0  0
"b212734r" 4 2 22048 2020 4 4 4  0 2 0  0
"b212734r" 4 4 22286 2021 4 4 4  0 2 0  0
"b212734r" 4 3 22286 2021 4 4 4  0 2 0  0
"b267147d" 1 1 21797 2019 . . .  . . 1  .
"b409846d" 2 1 21944 2020 . . .  . . 1  .
"b409846d" 2 2     . 2020 . . .  . . 1  .
"b698286a" 4 1 22042 2020 5 5 5 -1 1 0 -1
"b698286a" 4 2 22334 2021 4 5 4 -1 1 0 -1
"b698286a" 4 3 22362 2021 4 5 4 -1 1 0 -1
"b698286a" 4 4 22530 2022 4 5 4 -1 1 0 -1
"b843928h" 5 2     . 2021 . . .  . . 1  .
"b843928h" 5 1     . 2021 . . .  . . 1  .
"b843928h" 5 3     . 2022 . . .  . . 1  .
"b843928h" 5 4     . 2020 . . .  . . 1  .
"b843928h" 5 5     . 2021 . . .  . . 1  .
"c292209k" 2 1 22322 2021 3 3 3 -1 1 0 -1
"c292209k" 2 2 22599 2021 2 3 2 -1 1 0 -1
"c409187g" 1 1 22614 2022 3 3 3  . . 0  .
"c413499z" 1 1     . 2019 . . .  . . 1  .
"c418998d" 3 3     . 2022 . . .  . . 1  .
"c418998d" 3 1     . 2021 . . .  . . 1  .
"c418998d" 3 2     . 2021 . . .  . . 1  .
"c428523f" 1 1 21978 2020 . . .  . . 1  .
"c810877a" 1 1 21963 2020 2 2 2  . . 0  .
"d155383j" 4 1 22026 2020 . . .  . . 1  .
"d155383j" 4 2 22194 2020 . . .  . . 1  .
"d155383j" 4 3 22356 2021 . . .  . . 1  .
"d155383j" 4 4     . 2019 . . .  . . 1  .
"d411213c" 4 1 21950 2020 2 2 2  0 2 0  0
"d411213c" 4 2 22047 2020 5 2 5  0 2 0  0
"d411213c" 4 3 22559 2021 . 2 5  0 2 1  0
"d411213c" 4 4 22560 2022 2 2 2  0 2 0  0
"e187576m" 1 1 22473 2021 4 4 4  . . 0  .
"e674186s" 3 1 21963 2020 1 1 1  0 2 0  0
"e674186s" 3 2 22341 2021 1 1 1  0 2 0  0
"e674186s" 3 3 22525 2021 . 1 1  0 2 1  0
"e800689y" 1 1     . 2020 . . .  . . 1  .
"e902343i" 2 1     . 2020 . . .  . . 1  .
"e902343i" 2 2     . 2021 . . .  . . 1  .
"f875154g" 1 1     . 2019 . . .  . . 1  .
"g164414c" 2 1 22351 2021 4 4 4  . . 0  .
"g164414c" 2 2 22556 2021 . 4 4  . . 1  .
"g355857i" 5 1 21938 2020 4 4 4 -2 1 0 -2
"g355857i" 5 2 22180 2020 4 4 4 -2 1 0 -2
"g355857i" 5 3 22354 2021 2 4 2 -2 1 0 -2
"g355857i" 5 4 22551 2021 2 4 2 -2 1 0 -2
"g355857i" 5 5     . 2022 . 4 2 -2 1 1 -2
"g487986i" 2 1 21826 2020 . . .  . . 1  .
"g487986i" 2 2 21857 2020 . . .  . . 1  .
"g764560e" 1 1     . 2019 . . .  . . 1  .
"g847888b" 1 1     . 2022 . . .  . . 1  .
end
format %tdnn/dd/CCYY interviewdate
label values change_ovhealth2 change_ovhealth2
label def change_ovhealth2 1 "Improved", modify
label def change_ovhealth2 2 "Maintained", modify
label var consumerid "consumerid" 
label var interviewdate "interviewdate" 
label var ffy "ffy" 
label var overallhealth "overallhealth"

Again, thank you so much!

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30050
#8

24 Jun 2022, 18:31

Thanks for the example data. It looks like you successfully adapted my code to it, and it does what you wanted. Seeing how the data is organized, it's really quite parallel to my demonstration data set, and I can't think of any way to tweak the code I proposed to make it better.
Comment
Vanessa Garcia

Join Date: Jun 2022

Posts: 16
#9

24 Jun 2022, 19:38

Great, thank you so much for verifying!
Comment

Announcement