In the context of a large (N:105 434 persons – of these were 22 374 treated eating disorders (ED) patients) observational epidemiological follow-up study using record linkage, we are looking into educational attainment, i.e. highest level of education successfully completed in the two groups – the patients N: ~ 22 000 and the matched controls, N: ~82 000. Cf. The dataex-file attached. Using several sources of information, we have identified and recoded the datapoints where ISCED level changes.
The present datafile consist of ~ 1.6 million observations, and we would like to keep only the relevant observations, i.e. – by person: the first and last observation, plus all observations where ISCED-level changes.
We are unsure as to how to do this – I might have a ‘short circuit between the headphones’, but normally I am able to think.
Is it ‘keep’, ‘drop’, or ‘cond()’ we should use, or should we use the ‘if’ command during the analyses?
Happy New Year to all Statalisters
Søren Nielsen
Below you will find a sample of a .do file as well as the dataex.dta which might be helpful.
changeISCED.do
//purpose - to register changes in ISCED-level//
//AUTHOR Søren Nielsen//
//date 27-12-2022//
use "C:\Users\Søren\Documents\Projekter\ANDKREG197 0-2010\Delprojekter\Uddannelse\work\dataex.dta"
bysort id (HF10year): generate chISCED = ISCEDlevel - ISCEDlevel[_n-1]
//purpose - to find and locate changes in ISCEDlevel//
recode chISCED (1/7 = 1), gen(chptISCED).
// purpose - to identify the points in time where ISCED changes from one level to another level//
//as only observations where ISCED change level are of interest - all other observations might be dropped, or omitted in the analyses. Observations of interest are the following: the first [t==1] and the last [t=_N] observation of each person, as well as the observations where ISCED level changes[chptISCED==1].//
//question - how is this accomplished in a 'safe' way//
//keep if chptISCED== . |chptISCED==1|t=_N is 'not allowed'//
------------------ copy up to and including the previous line ------------------
Listed 37 out of 37 observations
This looks a bit more messy than I expected - this should be two examples of data "id" - id of persons, "stratum" means stratum - quintuplet number, "exp" - exposure 0 control, 1 ed-patient, "sex" 1 male 2 female,
Sorry for any inconvenience .
I hope that somebody will be able to advice on how to proceed
Sincerely Søren Nielsen
The present datafile consist of ~ 1.6 million observations, and we would like to keep only the relevant observations, i.e. – by person: the first and last observation, plus all observations where ISCED-level changes.
We are unsure as to how to do this – I might have a ‘short circuit between the headphones’, but normally I am able to think.
Is it ‘keep’, ‘drop’, or ‘cond()’ we should use, or should we use the ‘if’ command during the analyses?
Happy New Year to all Statalisters
Søren Nielsen
Below you will find a sample of a .do file as well as the dataex.dta which might be helpful.
changeISCED.do
//purpose - to register changes in ISCED-level//
//AUTHOR Søren Nielsen//
//date 27-12-2022//
use "C:\Users\Søren\Documents\Projekter\ANDKREG197 0-2010\Delprojekter\Uddannelse\work\dataex.dta"
bysort id (HF10year): generate chISCED = ISCEDlevel - ISCEDlevel[_n-1]
//purpose - to find and locate changes in ISCEDlevel//
recode chISCED (1/7 = 1), gen(chptISCED).
// purpose - to identify the points in time where ISCED changes from one level to another level//
//as only observations where ISCED change level are of interest - all other observations might be dropped, or omitted in the analyses. Observations of interest are the following: the first [t==1] and the last [t=_N] observation of each person, as well as the observations where ISCED level changes[chptISCED==1].//
//question - how is this accomplished in a 'safe' way//
//keep if chptISCED== . |chptISCED==1|t=_N is 'not allowed'//
Code:
* Example generated by -dataex-. For more info, type help dataex clear input str1 id str2 stratum float(exp sex) str1 ISCED byte ISCEDlevel float(HF10year t HF10age chISCED chptISCED) "6" "19" 0 2 "1" 1 1997 1 17.67 . . "6" "19" 0 2 "1" 1 1998 2 18.67 0 0 "6" "19" 0 2 "1" 1 1999 3 19.67 0 0 "6" "19" 0 2 "2" 2 2000 4 20.67 1 1 "6" "19" 0 2 "2" 2 2001 5 21.67 0 0 "6" "19" 0 2 "2" 2 2002 6 22.67 0 0 "6" "19" 0 2 "5" 5 2003 7 23.67 3 1 "6" "19" 0 2 "5" 5 2004 8 24.67 0 0 "6" "19" 0 2 "5" 5 2005 9 25.67 0 0 "6" "19" 0 2 "7" 7 2006 10 26.67 2 1 "6" "19" 0 2 "7" 7 2007 11 27.67 0 0 "6" "19" 0 2 "7" 7 2008 12 28.67 0 0 "6" "19" 0 2 "7" 7 2009 13 29.67 0 0 "6" "19" 0 2 "7" 7 2010 14 30.67 0 0 "6" "19" 0 2 "7" 7 2011 15 31.67 0 0 "6" "19" 0 2 "7" 7 2012 16 32.67 0 0 "6" "19" 0 2 "7" 7 2013 17 33.67 0 0 "6" "19" 0 2 "7" 7 2014 18 34.67 0 0 "7" "99" 1 2 "1" 1 1995 1 16.5 . . "7" "99" 1 2 "1" 1 1996 2 17.5 0 0 "7" "99" 1 2 "1" 1 1997 3 18.5 0 0 "7" "99" 1 2 "1" 1 1998 4 19.5 0 0 "7" "99" 1 2 "2" 2 1999 5 20.5 1 1 "7" "99" 1 2 "2" 2 2000 6 21.5 0 0 "7" "99" 1 2 "2" 2 2001 7 22.5 0 0 "7" "99" 1 2 "2" 2 2002 8 23.5 0 0 "7" "99" 1 2 "2" 2 2003 9 24.5 0 0 "7" "99" 1 2 "2" 2 2004 10 25.5 0 0 "7" "99" 1 2 "2" 2 2005 11 26.5 0 0 "7" "99" 1 2 "5" 5 2007 12 28.5 3 1 "7" "99" 1 2 "5" 5 2008 13 29.5 0 0 "7" "99" 1 2 "5" 5 2009 14 30.5 0 0 "7" "99" 1 2 "5" 5 2010 15 31.5 0 0 "7" "99" 1 2 "5" 5 2011 16 32.5 0 0 "7" "99" 1 2 "5" 5 2012 17 33.5 0 0 "7" "99" 1 2 "5" 5 2013 18 34.5 0 0 "7" "99" 1 2 "5" 5 2014 19 35.5 0 0 end
Listed 37 out of 37 observations
This looks a bit more messy than I expected - this should be two examples of data "id" - id of persons, "stratum" means stratum - quintuplet number, "exp" - exposure 0 control, 1 ed-patient, "sex" 1 male 2 female,
Sorry for any inconvenience .
I hope that somebody will be able to advice on how to proceed
Sincerely Søren Nielsen
Comment