Creating a lagged variable in spell data

Hend She

Join Date: Jul 2020

Posts: 70
#1

Creating a lagged variable in spell data

25 Oct 2022, 07:49

Dear all,

I am using spell data to capture the event of moving abroad (return).

I provide an exemplary dataset containing some of the variables of interest-see below, the dataset contains variables like spell number, spell begin, spell end. I tried merging the info. from panel data on life satisfaction and I merged it in the wide format of the "ls*" variable so I have it now generated for 1984-2018, this latter is on a scale from 0 to 10 where 0 is the minimum and 10 is the maximum current life satisfaction estimated at each survey year. In the example data, I only contained ls1984 & ls1985 but this should be there for every year until 2018. Actualreturn is a binary variable if attrition==moved abroad, spelltyp==2 means if they participated in the survey directly before moving abroad

Code:

gen attrition = spelltyp if spelltyp[_n-1] == 2 & pid == pid[_n-1]

I'd like to test the effect of current life satisfaction (ls) expressed in the lagged year of departure on actual return probability. I assume the begin of "moving abroad" spell reflects the year of emigration. Thus, I need to generate a variable for life satisfaction one year before emigration, something that yields; lagged_ls=ls(begin-1). So, I can test the effect of this lagged_ls on actualreturn.

-------

Code:

* Example generated by -dataex-. For more info, type help dataex clear input float(id actualreturn) long spellnr int(begin end) float attrition byte(ls1984 ls1985) 12127 1 3 1988 1988 3 5 0 12128 1 3 1988 1988 3 1 0 12136 1 3 2000 2000 3 8 8 12137 1 3 2000 2000 3 9 8 12138 1 3 1989 1989 3 10 8 12139 1 3 1985 1985 3 10 . 12145 1 3 2008 2008 3 7 8 12146 1 3 2008 2008 3 7 8 12153 1 2 2000 2000 3 10 10 12164 1 3 1994 1994 3 8 7 end label values attrition spelltyp label def spelltyp 3 "[3] Living Abroad", modify label values ls1984 plh0182 label values ls1985 plh0182 label def plh0182 1 "[1] 1 Zufrieden: Skala 0-Niedrig bis 10-Hoch", modify label def plh0182 5 "[5] 5 Zufrieden: Skala 0-Niedrig bis 10-Hoch", modify label def plh0182 7 "[7] 7 Zufrieden: Skala 0-Niedrig bis 10-Hoch", modify label def plh0182 8 "[8] 8 Zufrieden: Skala 0-Niedrig bis 10-Hoch", modify label def plh0182 9 "[9] 9 Zufrieden: Skala 0-Niedrig bis 10-Hoch", modify label def plh0182 10 "[10] 10 Zufrieden: Skala 0-Niedrig bis 10-Hoch", modify label def plh0182 0 "[0] 0 Zufrieden: Skala 0-Niedrig bis 10-Hoch", modify
Tags: spell data, spelldata, syntax

Clyde Schechter

Join Date: Apr 2014
Posts: 30166

25 Oct 2022, 10:49

It was a mistake to -merge- in the ls variables in wide layout. You now have a data set that is both long and wide in year, which is about as difficult to work with as you can possibly get. You should have combined all the ls data files into a long data set and then -merge-d that with your spell data. While it is possible to solve your problem with the current data organization, keeping it this way will only lead to more difficulties as you proceed further in working with it. Better to fix it now.

Your example data is also not really suitable for illustrating the solution to your problem because:

None of the spells last more than one year; it is fairly simple to write code that will work for this, but unless all your full dataset's spells are also one year, that simple code will produce incorrect results. To work out code that will work with multi-year spells is a tad harder and requires an example that exhibits that behavior.
None of your id's have more than one spell.
While I understand not showing all of the numerous ls* variables, the ones you chose to show, 1984 and 1985, will only match to one of the spells shown in the example because the rest of those are in the 1990's or 2000's.

To overcome these limitations, I have extended and modified your example data.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear*
input float(id actualreturn) long spellnr int(begin end) float attrition byte(ls1984 ls1985)
12127 1 3 1986 1988 3  5  0
12128 1 3 1988 1988 3  1  0
12136 1 3 2000 2000 3  8  8
12137 1 3 2000 2000 3  9  8
12138 1 3 1989 1989 3 10  8
12139 1 3 1985 1987 3 10  .
12139 1 3 1988 1992 3 10  .
12145 1 3 2008 2008 3  7  8
12146 1 3 2008 2008 3  7  8
12153 1 2 2000 2000 3 10 10
12164 1 3 1994 1994 3  8  7
end
label values attrition spelltyp
label def spelltyp 3 "[3] Living Abroad", modify
label values ls1984 plh0182
label values ls1985 plh0182
label def plh0182 1 "[1] 1 Zufrieden: Skala 0-Niedrig bis 10-Hoch", modify
label def plh0182 5 "[5] 5 Zufrieden: Skala 0-Niedrig bis 10-Hoch", modify
label def plh0182 7 "[7] 7 Zufrieden: Skala 0-Niedrig bis 10-Hoch", modify
label def plh0182 8 "[8] 8 Zufrieden: Skala 0-Niedrig bis 10-Hoch", modify
label def plh0182 9 "[9] 9 Zufrieden: Skala 0-Niedrig bis 10-Hoch", modify
label def plh0182 10 "[10] 10 Zufrieden: Skala 0-Niedrig bis 10-Hoch", modify
label def plh0182 0 "[0] 0 Zufrieden: Skala 0-Niedrig bis 10-Hoch", modify

//  REMOVE THE ls* VARIABLES AND RE-ORGANIZE THEM INTO A LONG DATA SET
frame put id ls*, into(ls_data)
drop ls*
frame ls_data {
    gen long obs_no = _n
    reshape long ls, i(obs_no) j(year)
    drop obs_no
    duplicates drop
    isid id year, sort
}

gen link_year = begin-1
frlink m:1 id link_year, frame(ls_data id year)
frget ls, from(ls_data)
drop ls_data
frame drop ls_data

This code uses -frames-, so it requires version 16 or later.

Last edited by Clyde Schechter; 25 Oct 2022, 10:51.

Comment

Hend She

Join Date: Jul 2020
Posts: 70

28 Jun 2023, 11:05

Dear Clyde, I am very grateful for your code and remarks. I worked with it. However, I deleted the wide transformation in my code and reversed this step, so I would like to make sure that my current way captures how it should be done in a code without correction (since the wide data step no longer exists but rather now in a long format). I proceeded based on your generated link_year variable above but I am still uncertain if the steps are done correctly in my code.

And yes, it is true that no spell lasts more than one year, as the analysis is not a Cox hazard model but emphasizes the effects of return intentions/life satisfaction the year before respondents move abroad (before actualreturn=1). Could we repeat it assuming no wide data exists?

For ex.
Is the way written here below the right way or did I commit a mistake? (This is based on my real data, not an exemplary one this time)

Code:

                                            //link_year refers to the year before the start of the emigration spell
gen link_year = begin-1                        //refers to the year before actual emigration happens(i.e., before moving outside Germany), begin here refers to the emigration spell
tab link_year if attrition==3
kdensity link_year if attrition==3
rename syear syr                            //syr is the original variable that refers to the survey year as provided originally in the survey data "Befragungsjahr"
rename link_year syear                        //link year is now called syear which corresponds to the year BEFORE a spell begins (t-1), i.e., before survey respondents leave Germany

kdensity syear if attrition==3              // attrition=[3] refers to  (Living Abroad)

**Merging with long format data, to include the lagged variables of both life satisfaction (ls) & return intenton (int)
*The below dataset (ls) contain the following variables, and the (int) dataset has a similar structure

/*---------------------------------------------------------------------------------
Variable      Storage   Display    Value
    name         type    format    label      Variable label
------------------------------------------------------------------------------------
pid             long    %12.0g                Unveraenderliche Personennummer
syear           long    %12.0g                Erhebungsjahr (SurveyYear)                //syear or survey year (unchanged, as offered in the survey data) captures observations from (1984 until 2018)
ls              byte    %47.0g     plh0182    Lebenszufriedenheit gegenwaertig            //current life satisfaction
-----------------------------------------------------------------------------------*/

*Here I retained the relevant variables in two datasets, life satisfaction (ls) & return intentions (int) respectively
drop _merge
merge 1:1 pid syear using  "C:\Documents\G\Return\Working files\ls.dta"
drop _merge
merge 1:1 pid syear using  "C:\Documents\G\Return\Working files\int.dta"
drop _merge


******Graph of the mean of lagged life sat. pre-departure (by year of emigration)*************************************************************

*Grah_4: Kernel density estimate of panel attrition cases due to moving abroad by year of emigration

bysort syear: egen average_ls = mean(ls) if actualreturn==1
twoway (connected average_ls syear)                                            //recall syear here is the lagged year before emigration outside of the country

Last edited by Hend She; 28 Jun 2023, 11:40.

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30166
#4

28 Jun 2023, 14:03

The linkage to get the preceding year's life satisfaction score appears to be correct. But as you are not showing the data, I can't vouch for anything other than that.
1 like
Comment

Announcement