Identifying a change in a specific variable in panel-data

Carsten Waeltner

Join Date: Jan 2022

Posts: 7
#1

Identifying a change in a specific variable in panel-data

01 Jun 2022, 04:55

Dear Statalists,

I´m currently working on a dataset which is considered to be a quasi-panel dataset.
My interest within the dataset relies on individuals, identifier: "pid" and two other variables namely "vac" and "wave" with wave representing the time of observation for individual i in any wave, where the individual chose to take part in the interview.

What I would like to do is to identify the individuals within the dataset who changed their answer with regards to the "vac" variable over time.
For example individual 1 changed the answer to "vac" in wave 6. In wave 5 the answer was "n" in wave 6 it was "y". I would like to keep those two observations (from wave 5 and wave 6) in the dataset and remove the earlier ones, namely from wave 1 to 4.
For individual 2 it is a bit more complex as individual 2 changed his answers more frequently. However I would like to keep only the last observation where the individual changed his answer with regards to the "vac" variable.
Namely for this individual I would like to keep his observations in the dataset from wave 5 and wave 6 as in wave 5 he answered "n" and in wave 6 "m".

If there is an option to generate a new variable which allows me to keep only those last two observations where in one observation a change in "vac" happened I would highly appreciate your help in finding out how to code it correctly.

Thank you all very much in advance and I hope my explanation given above can help to find a solution to my problem.
Greetings
Carsten

Code:

* Example generated by -dataex-. For more info, type help dataex clear input byte pid str1 vac byte wave 1 "y" 1 1 "y" 2 1 "y" 3 1 "y" 4 1 "y" 5 1 "n" 6 2 "n" 1 2 "y" 3 2 "n" 5 2 "m" 6 2 "m" 7 3 "y" 4 3 "y" 5 3 "n" 6 3 "m" 7 4 "y" 5 4 "y" 6 4 "y" 8 4 "n" 9 5 "m" 1 5 "n" 3 5 "m" 5 6 "n" 2 6 "n" 3 6 "y" 4 7 "m" 1 8 "m" 2 9 "n" 3 10 "y" 4 11 "y" 1 12 "y" 1 13 "y" 1 13 "n" 2 13 "n" 6 13 "y" 7 14 "n" 3 14 "n" 5 14 "y" 6 14 "m" 8 15 "n" 1 16 "m" 2 17 "n" 3 18 "m" 4 19 "n" 6 20 "m" 9 end
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30097
#2

01 Jun 2022, 11:33

Code:

by pid (wave), sort: gen byte change = vac != vac[_n-1] & _n > 1 by pid (wave): keep if inlist(1, change[_n+1], change) by pid (wave): keep if _n >= _N-1

Note: You do not state whether a "change" requires that the two waves be consecutive. In the examples you give, they are. There are, however, several instances in your data where the response to vac differs between a wave and the next available wave, but there is a gap between the waves. The above code resolves this unclarity by assuming that you do not care whether the waves in question are consecutive. (I resolved it in this way only because it is simpler to code, not because I have any reason to think that is the better choice.)
Comment
Carsten Waeltner

Join Date: Jan 2022

Posts: 7
#3

01 Jun 2022, 13:05

Originally posted by Clyde Schechter View Post

Code:

by pid (wave), sort: gen byte change = vac != vac[_n-1] & _n > 1 by pid (wave): keep if inlist(1, change[_n+1], change) by pid (wave): keep if _n >= _N-1

Note: You do not state whether a "change" requires that the two waves be consecutive. In the examples you give, they are. There are, however, several instances in your data where the response to vac differs between a wave and the next available wave, but there is a gap between the waves. The above code resolves this unclarity by assuming that you do not care whether the waves in question are consecutive. (I resolved it in this way only because it is simpler to code, not because I have any reason to think that is the better choice.)

Dear Clyde,

Thank you so, so much for your help.
Indeed I did not mention any further details regarding waves and the change in the variable. That is in fact because I do not care about the waves being consecutive, which you assumed correctly.
Nevertheless I should have given this information here as you stated that there will be another code I would need.

Again, I would like to thank you for your fast help and the code you provided.
I tested the code a few minutes ago and it worked perfectly well.

Greetings
Carsten Waeltner
Comment
Carsten Waeltner

Join Date: Jan 2022

Posts: 7
#4

16 Jun 2022, 05:44

Hello together,

unfortunately I ran into another problem with the dataset.
I would like to create a variable which takes the value 0 for the first observation by pid IF in the first observation vacc==2 AND in the second observation vacc==3.
The code I am using to achieve this is the following: by pid: gen byte ny =0 if vacc[_n]==2 & vacc[_n+1]==3
Unfortunately, my problem comes up right now. I searched the internet for a solution, but I did not find any, so that is why I am asking here again for help.

The problem I´m having is the following: I need to assign a 1 to the second observation IF vacc==2 in the first observation is true AND in the second observation vacc==3 is true.
I do not get it how to assign this value of 1 to the second observation. The code only replaces/generates values for the first observation of the two.

I managed to create a workaround by simply reversing the order of the wave variable.
Then I can use the code which assigned the 0s to the first observation but this time it will assign a 1 to the original last observation.
The code works as follows:

First:
gen waverev=.
replace waverev=1 if wave==10
replace waverev=2 if wave==9
replace waverev=3 if wave==8
replace waverev=4 if wave==7
replace waverev=5 if wave==6
replace waverev=6 if wave==5
replace waverev=7 if wave==4
replace waverev=8 if wave==3
replace waverev=9 if wave==2
replace waverev=10 if wave==1

Second:
sort pid waverev

Third:
by pid: replace ny =1 if vacc[_n]==2 & vacc[_n+1]==3

This "solution" gets the job done, but I was wondering if there is a more elegant version to that. I was also trying to find an expression to adress the second observation in my panel but I did not find any solution to that, except the [_n==2] but this will result in giving EVERY second observation which has vacc==3 a 1 and that is exactly what I do not want.
The 1 shall only come up for the second observation if vacc==2 is true for the first observation AND vacc=3 is true for the second observation.

Thanks in advance for help.

Greetings
Carsten

Code:

* Example generated by -dataex-. For more info, type help dataex clear input byte pid str1 vac byte(wave change) long vacc byte ny 1 "y" 5 0 3 . 1 "n" 6 1 2 . 2 "n" 5 1 2 . 2 "m" 6 1 1 . 3 "n" 6 1 2 . 3 "m" 7 1 1 . 4 "y" 8 0 3 . 4 "n" 9 1 2 . 5 "n" 3 1 2 . 5 "m" 5 1 1 . 6 "n" 3 0 2 0 6 "y" 4 1 3 . 13 "n" 6 0 2 0 13 "y" 7 1 3 . 14 "y" 6 1 3 . 14 "m" 8 1 1 . end label values vacc vacc label def vacc 1 "m", modify label def vacc 2 "n", modify label def vacc 3 "y", modify
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30097
#5

16 Jun 2022, 12:36

I don't really follow what you are trying to do here. First, in the example data you show here (unlike the example data you showed earlier in the thread) each pid has exactly two observations. Is that true throughout the data you are currently working on? Also, the data appear to be sorted by wave within pid--is that also true throughout, or just in the example?

I understand "I would like to create a variable which takes the value 0 for the first observation by pid IF in the first observation vacc==2 AND in the second observation vacc==3." But what do you want the value of this variable to be in the first observation if that condition is not met.

And for the second observation, it seems you want the new variable to be 1 under precisely the same condition as when the first observation gets ny = 0. Do I have that right? And if so, what do you want the new variable to be in the second observation if ny != 0?

Finally, I will just mention that all of that code to create the variable wavrev can be simplified to a single line:

Code:

gen waverev = 11 - wave

That said, I doubt that a good solution to your problem will actually need that variable anyway.
Comment
Carsten Waeltner

Join Date: Jan 2022

Posts: 7
#6

17 Jun 2022, 10:52

Dear Clyde,

I came up with a solution for my problem.
The code is bysort pid (wave) : replace ny=1 if vacc[_n-1]==2 & vacc[_n]==3

I did not post the original data. The dataset posted earlier this week is cleaned up with the help from your code earlier.
The data is a 2 observation per person dataset and will now be used to perform conditional logistic regression.

Thank you for your help and greetings
Carsten
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30097
#7

17 Jun 2022, 11:41

I'm glad you solved your problem. Just one concern. It is usually not a good idea in Stata to create a variable whose values are 1 and missing value to mark a dichotomous variable. It is usually better to make it 1 = yes and 0 = no. So consider changing your code to:

Code:

bysort pid (wave) : replace ny = (vacc[_n-1]==2 & vacc[_n]==3)

I note, also, that this code does not appear to do what you asked for in #4, but apparently it does what you really want, so good!
Comment

Announcement

Identifying a change in a specific variable in panel-data

Comment

Comment

Comment

Comment

Comment

Comment