Hello great people,
I am investigating teenage pregnancies before and after the pandemic specifically from 2017 to 2022 as part of my initial analysis. I am using the Kenya 2022 Demographic Health Survey (DHS), individual woman recode, which is cross sectional in nature. As DiD can't be conducted using cross-sectional, I am attempting to transform my data to be at least in repeated cross-sections. My primary unit of analysis or cluster, as I figured from the DHS report, is residence (rural/urban v025). I have performed the below commands to assess teenage pregnancies for each year from 2017 to 2022 and since rural areas, from my graphical analyses, seem to be driving the change, I was advised to use residence, i.r rural/urban as treatment vs control groups in the DiD estimations. Do these commands lead me in the right direction? How do I set up the DiD estimation commands if I take 2020 to be the year of Covid and therefore the intervention year, 2017-2019 to be pre-Intervention and 2021-2022 to be post-Intervention? Later, I would also want to assess these on a monthly aggregated level.
PS: I have successfully installed dataex to help in sharing the output from the below commands but I'm not sure why I can't post the dataset here successfully. Any suggestions? I have anyway shared a snip of the relevant variables after my reshape.
clear all
cd "/Users/melynoluoch/Documents/MasterThesis/DHS2022/KEIR8BDT"
capture log close
log using MasterThesis, replace
use KEIR8BFL.DTA, clear
*YEARS FOR ANALYSIS: keep only years 2017 to 2022 for the variables relating to year of pregnancy outcome (p2_01 to p2_20)
foreach var of varlist p2_01-p2_20 {
replace `var' = 0 if `var' < 2017 | `var' > 2022
}
*TEENAGE PREGNANCY VARIABLES: v201 "Total children ever born", v213 "Currently pregnant", v245 "Pregnancy losses"
/*keeping only variables required for analysis of 2017 to 2022 trend analysis of teenage pregnancies, and v010 (respondent's year of birth), v025 (urban vs rural), v101 (region), v190(wealth), v106(highest educ level), v151(sex of hh head), v152(age of household head) */
keep v201 v213 v245 v013 p2_01-p2_09 v010 v025 v101 v190 v106 v151 v152
*TRANSFORMING THE DATA for ease of analysis
*Sort the dataset by the age-group variable
sort v013
*Creating a new identifier variable named "id"
gen id = _n
/*Creating teenager-dummy variables from years 2017-2022. v010 is the year of birth variable. For example, an individual woman is a teenager in 2017 (between age 15-19) if they were born between 1998 and 2002*/
gen Teenage_2017=1 if v010 <=2002 & v010 >=1998
gen Teenage_2018=1 if v010 <=2003 & v010 >=1999
gen Teenage_2019=1 if v010 <=2004 & v010 >=2000
gen Teenage_2020=1 if v010 <=2005 & v010 >=2001
gen Teenage_2021=1 if v010 <=2006 & v010 >=2002
gen Teenage_2022=1 if v010 <=2007 & v010 >=2003
*creating new variables for years 2017 to 2022 with the prefix "year_"
gen year_2017=0
gen year_2018=0
gen year_2019=0
gen year_2020=0
gen year_2021=0
gen year_2022=0
*creating observations (count) for children to teenagers from the "year of pregnancy outcome" variables p2_01 to p2_09
foreach n of varlist p2_01-p2_20 {
replace year_2017= year_2017 + (`n'==2017) if Teenage_2017==1
replace year_2018= year_2018 + (`n'==2018) if Teenage_2018==1
replace year_2019= year_2019 + (`n'==2019) if Teenage_2019==1
replace year_2020= year_2020 + (`n'==2020) if Teenage_2020==1
replace year_2021= year_2021 + (`n'==2021) if Teenage_2021==1
replace year_2022= year_2022 + (`n'==2022) if Teenage_2022==1
}
save "Output/BeforeReshape", replace
* reshape the variables p2_01 through p2_20 from wide to long format, creating a new variable named outcome_year
reshape long year_, i(id) j(outcome_year)
I am investigating teenage pregnancies before and after the pandemic specifically from 2017 to 2022 as part of my initial analysis. I am using the Kenya 2022 Demographic Health Survey (DHS), individual woman recode, which is cross sectional in nature. As DiD can't be conducted using cross-sectional, I am attempting to transform my data to be at least in repeated cross-sections. My primary unit of analysis or cluster, as I figured from the DHS report, is residence (rural/urban v025). I have performed the below commands to assess teenage pregnancies for each year from 2017 to 2022 and since rural areas, from my graphical analyses, seem to be driving the change, I was advised to use residence, i.r rural/urban as treatment vs control groups in the DiD estimations. Do these commands lead me in the right direction? How do I set up the DiD estimation commands if I take 2020 to be the year of Covid and therefore the intervention year, 2017-2019 to be pre-Intervention and 2021-2022 to be post-Intervention? Later, I would also want to assess these on a monthly aggregated level.
PS: I have successfully installed dataex to help in sharing the output from the below commands but I'm not sure why I can't post the dataset here successfully. Any suggestions? I have anyway shared a snip of the relevant variables after my reshape.
clear all
cd "/Users/melynoluoch/Documents/MasterThesis/DHS2022/KEIR8BDT"
capture log close
log using MasterThesis, replace
use KEIR8BFL.DTA, clear
*YEARS FOR ANALYSIS: keep only years 2017 to 2022 for the variables relating to year of pregnancy outcome (p2_01 to p2_20)
foreach var of varlist p2_01-p2_20 {
replace `var' = 0 if `var' < 2017 | `var' > 2022
}
*TEENAGE PREGNANCY VARIABLES: v201 "Total children ever born", v213 "Currently pregnant", v245 "Pregnancy losses"
/*keeping only variables required for analysis of 2017 to 2022 trend analysis of teenage pregnancies, and v010 (respondent's year of birth), v025 (urban vs rural), v101 (region), v190(wealth), v106(highest educ level), v151(sex of hh head), v152(age of household head) */
keep v201 v213 v245 v013 p2_01-p2_09 v010 v025 v101 v190 v106 v151 v152
*TRANSFORMING THE DATA for ease of analysis
*Sort the dataset by the age-group variable
sort v013
*Creating a new identifier variable named "id"
gen id = _n
/*Creating teenager-dummy variables from years 2017-2022. v010 is the year of birth variable. For example, an individual woman is a teenager in 2017 (between age 15-19) if they were born between 1998 and 2002*/
gen Teenage_2017=1 if v010 <=2002 & v010 >=1998
gen Teenage_2018=1 if v010 <=2003 & v010 >=1999
gen Teenage_2019=1 if v010 <=2004 & v010 >=2000
gen Teenage_2020=1 if v010 <=2005 & v010 >=2001
gen Teenage_2021=1 if v010 <=2006 & v010 >=2002
gen Teenage_2022=1 if v010 <=2007 & v010 >=2003
*creating new variables for years 2017 to 2022 with the prefix "year_"
gen year_2017=0
gen year_2018=0
gen year_2019=0
gen year_2020=0
gen year_2021=0
gen year_2022=0
*creating observations (count) for children to teenagers from the "year of pregnancy outcome" variables p2_01 to p2_09
foreach n of varlist p2_01-p2_20 {
replace year_2017= year_2017 + (`n'==2017) if Teenage_2017==1
replace year_2018= year_2018 + (`n'==2018) if Teenage_2018==1
replace year_2019= year_2019 + (`n'==2019) if Teenage_2019==1
replace year_2020= year_2020 + (`n'==2020) if Teenage_2020==1
replace year_2021= year_2021 + (`n'==2021) if Teenage_2021==1
replace year_2022= year_2022 + (`n'==2022) if Teenage_2022==1
}
save "Output/BeforeReshape", replace
* reshape the variables p2_01 through p2_20 from wide to long format, creating a new variable named outcome_year
reshape long year_, i(id) j(outcome_year)
Comment