Strange lag operator behavior in gen vs. replace

Noah Blake Smith

Join Date: Feb 2023

Posts: 9
#1

Strange lag operator behavior in gen vs. replace

09 Mar 2023, 13:48

In the example below, the time-series lag operator behaves as I would expect for gen id2, but not for replace id. Why isn't id equal to id2 here? I am using the latest version of StataSE 17.

Code:

clear input id t 1 1 1 2 1 3 2 1 2 2 2 3 end tsset id t gen id2 = L.id replace id = L.id

Output:
id t id2

. 1 .

. 2 1

. 3 1

. 4 .

. 5 2

. 6 2

Last edited by Noah Blake Smith; 09 Mar 2023, 13:50.
Tags: data, lag, panel, panel data, Time Series
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2400
#2

09 Mar 2023, 14:24

The issue is that -replace- (evidently) modifies data in place. Meaning, when the lag operator is used, the very first observation within each -id- group is missing because nothing comes before in terms of time -t-. Then for the next time, the previous value is missing, so that gets replaced, and so on. But this is exactly as I would expect a lag operator to work. It also doesn't make sense (in real applications) to modify your identifier variable as you are doing here, so I wouldn't expect this to bite in real work.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30060
#3

09 Mar 2023, 14:30

When you use -gen x = something- or -replace = something-, Stata does this observation by observation in the data. That is, the -gen- or -replace- is first carried out in the first observation, and then in the second, and then in the third, etc. In particular, when it is doing the n'th observation, the data in the 1st through n-1st have already been created/changed.

So in your particular case, when you do -replace id = L.id-, Stata starts in the first observation. L.id[1] is necessarily missing because there is no lag for the first observation. So id is now replaced by missing value. On to the second observation: id[2] is now to be replaced by id[1]. But id[1] is no longer its original value: it has already been changed to missing value, so id[2] is now set to missing value. And so on through all of the observations. By contrast, with -gen id2 = L.id- you do not run into this problem because the source of your replacement, id, is not being changed by the -gen id2 = ...- command, so the original values are simply shifted down (except for the first in each id group, where the lag is, of course, missing.)

Added: Crossed with #2.

By the way, this isn't really a difference between -gen- and -replace-. It's a difference between an assignment that draws on the same variable being assigned and one that doesn't. Of course, a -gen- command can't draw on the same variable being assigned because it doesn't exist yet, so we only observe the phenomenon with -replace-, not -gen-. But the real cause of the difference is the "self reference" in the value to be assigned.

Last edited by Clyde Schechter; 09 Mar 2023, 14:33.
2 likes
Comment

id	t	id2
.	1	.
.	2	1
.	3	1
.	4	.
.	5	2
.	6	2

Announcement

Strange lag operator behavior in gen vs. replace

Comment

Comment