Difference-in-differences (Teenage pregnancies before and after the onset of Covid-19 in Kenya)

Melyn Oluoch

Join Date: Aug 2023

Posts: 19
#1

Difference-in-differences (Teenage pregnancies before and after the onset of Covid-19 in Kenya)

23 Mar 2024, 11:10

Hello great people,

I am investigating teenage pregnancies before and after the pandemic specifically from 2017 to 2022 as part of my initial analysis. I am using the Kenya 2022 Demographic Health Survey (DHS), individual woman recode, which is cross sectional in nature. As DiD can't be conducted using cross-sectional, I am attempting to transform my data to be at least in repeated cross-sections. My primary unit of analysis or cluster, as I figured from the DHS report, is residence (rural/urban v025). I have performed the below commands to assess teenage pregnancies for each year from 2017 to 2022 and since rural areas, from my graphical analyses, seem to be driving the change, I was advised to use residence, i.r rural/urban as treatment vs control groups in the DiD estimations. Do these commands lead me in the right direction? How do I set up the DiD estimation commands if I take 2020 to be the year of Covid and therefore the intervention year, 2017-2019 to be pre-Intervention and 2021-2022 to be post-Intervention? Later, I would also want to assess these on a monthly aggregated level.

PS: I have successfully installed dataex to help in sharing the output from the below commands but I'm not sure why I can't post the dataset here successfully. Any suggestions? I have anyway shared a snip of the relevant variables after my reshape.

clear all

cd "/Users/melynoluoch/Documents/MasterThesis/DHS2022/KEIR8BDT"

capture log close
log using MasterThesis, replace

use KEIR8BFL.DTA, clear

*YEARS FOR ANALYSIS: keep only years 2017 to 2022 for the variables relating to year of pregnancy outcome (p2_01 to p2_20)
foreach var of varlist p2_01-p2_20 {

replace `var' = 0 if `var' < 2017 | `var' > 2022
}

*TEENAGE PREGNANCY VARIABLES: v201 "Total children ever born", v213 "Currently pregnant", v245 "Pregnancy losses"
/*keeping only variables required for analysis of 2017 to 2022 trend analysis of teenage pregnancies, and v010 (respondent's year of birth), v025 (urban vs rural), v101 (region), v190(wealth), v106(highest educ level), v151(sex of hh head), v152(age of household head) */

keep v201 v213 v245 v013 p2_01-p2_09 v010 v025 v101 v190 v106 v151 v152

*TRANSFORMING THE DATA for ease of analysis
*Sort the dataset by the age-group variable
sort v013

*Creating a new identifier variable named "id"
gen id = _n

/*Creating teenager-dummy variables from years 2017-2022. v010 is the year of birth variable. For example, an individual woman is a teenager in 2017 (between age 15-19) if they were born between 1998 and 2002*/

gen Teenage_2017=1 if v010 <=2002 & v010 >=1998
gen Teenage_2018=1 if v010 <=2003 & v010 >=1999
gen Teenage_2019=1 if v010 <=2004 & v010 >=2000
gen Teenage_2020=1 if v010 <=2005 & v010 >=2001
gen Teenage_2021=1 if v010 <=2006 & v010 >=2002
gen Teenage_2022=1 if v010 <=2007 & v010 >=2003

*creating new variables for years 2017 to 2022 with the prefix "year_"
gen year_2017=0
gen year_2018=0
gen year_2019=0
gen year_2020=0
gen year_2021=0
gen year_2022=0

*creating observations (count) for children to teenagers from the "year of pregnancy outcome" variables p2_01 to p2_09
foreach n of varlist p2_01-p2_20 {

replace year_2017= year_2017 + (`n'==2017) if Teenage_2017==1
replace year_2018= year_2018 + (`n'==2018) if Teenage_2018==1
replace year_2019= year_2019 + (`n'==2019) if Teenage_2019==1
replace year_2020= year_2020 + (`n'==2020) if Teenage_2020==1
replace year_2021= year_2021 + (`n'==2021) if Teenage_2021==1
replace year_2022= year_2022 + (`n'==2022) if Teenage_2022==1

}

save "Output/BeforeReshape", replace

* reshape the variables p2_01 through p2_20 from wide to long format, creating a new variable named outcome_year
reshape long year_, i(id) j(outcome_year)
Tags: None
George Ford

Join Date: Aug 2014

Posts: 3182
#2

23 Mar 2024, 12:10

You have no control group. Covid is a treatment the world received. You can test the difference between urban/rural, and it will look like a DID model, but it's not a 2x2DD model since you lack a non-covid group.
Comment
Melyn Oluoch

Join Date: Aug 2023

Posts: 19
#3

23 Mar 2024, 16:40

Yeah, ideally I have no valid control group if we view Covid as the intervention. In my case, I was advised to dig deeper in the literature to assess if there was any policy or interventions (due to Covid) that have sustained the teenage pregnancy numbers up in rural areas especially in 2021, and in doing so, estimate a DiD model. This kind of policy should have perhaps occurred in rural but not in urban for the DiD to be an effective estimation method.
And from your assessment, what kind of a DiD model do you mean?
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10262
#4

24 Mar 2024, 08:04

You may want to consider Covid interventions rather than Covid itself as the treatment. If there were lockdowns or curfews implemented in urban areas, with potentially stricter enforcement compared to rural areas, restricting freedom of movement, and if there were similar trends in teenage pregnancies prior to the Covid interventions, then you can specify a DID model. I recall a similar scenario where lockdowns were concentrated in high population density urban areas; see https://www.reuters.com/world/middle...ve-2021-03-26/.
1 like
Comment
George Ford

Join Date: Aug 2014

Posts: 3182
#5

24 Mar 2024, 08:14

You have data on teen preg in urban/rural both before and after Covid. Thus, you have something that looks like a diff-in-diff (Yu1 - Yu0) - (Yr1 - Yr0). But, this variant just tells you how urban rural differed after Covid, not what the effect of Covid was.

In causal DID, the treated and the control have specific roles -- one is treated and one is control and you want to know the difference between them. But urban and rural have no specific role to play in the analysis--you can calculate a difference, but it doesn't tell you about a treatment effect.
Comment
Melyn Oluoch

Join Date: Aug 2023

Posts: 19
#6

26 Mar 2024, 08:12

Thanks Andrew Musau. I considered assessing lockdowns and still considering this as per the information from the link you've shared. However, I'm still wondering you could check through my commands to see if I'm structuring my data in the right way considering residence (urban/rural) being the unit of analysis.
George Ford thanks as well. Could you also share feedback on my transformation of the data?
Comment
George Ford

Join Date: Aug 2014

Posts: 3182
#7

26 Mar 2024, 09:11

use dataex to post some data so we can see what you have. enough to actually understand it and maybe estimate a simple model.
Comment

Melyn Oluoch

Join Date: Aug 2023
Posts: 19

26 Mar 2024, 16:10

These are part of my variables after my reshape:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float id int(outcome_year v010) byte(v013 v025 v101) float year_
 1 2017 2003 1 2 20 0
 1 2018 2003 1 2 20 0
 1 2019 2003 1 2 20 0
 1 2020 2003 1 2 20 0
 1 2021 2003 1 2 20 0
 1 2022 2003 1 2 20 0
 2 2017 2006 1 2 36 0
 2 2018 2006 1 2 36 0
 2 2019 2006 1 2 36 0
 2 2020 2006 1 2 36 0
 2 2021 2006 1 2 36 0
 2 2022 2006 1 2 36 0
 3 2017 2004 1 2 33 0
 3 2018 2004 1 2 33 0
 3 2019 2004 1 2 33 0
 3 2020 2004 1 2 33 0
 3 2021 2004 1 2 33 0
 3 2022 2004 1 2 33 0
 4 2017 2003 1 2  4 0
 4 2018 2003 1 2  4 0
 4 2019 2003 1 2  4 1
 4 2020 2003 1 2  4 0
 4 2021 2003 1 2  4 1
 4 2022 2003 1 2  4 0
 5 2017 2005 1 2 25 0
 5 2018 2005 1 2 25 0
 5 2019 2005 1 2 25 0
 5 2020 2005 1 2 25 0
 5 2021 2005 1 2 25 0
 5 2022 2005 1 2 25 0
 6 2017 2002 1 2 37 0
 6 2018 2002 1 2 37 0
 6 2019 2002 1 2 37 0
 6 2020 2002 1 2 37 1
 6 2021 2002 1 2 37 0
 6 2022 2002 1 2 37 0
 7 2017 2006 1 2 42 0
 7 2018 2006 1 2 42 0
 7 2019 2006 1 2 42 0
 7 2020 2006 1 2 42 0
 7 2021 2006 1 2 42 0
 7 2022 2006 1 2 42 0
 8 2017 2003 1 1  5 0
 8 2018 2003 1 1  5 0
 8 2019 2003 1 1  5 0
 8 2020 2003 1 1  5 0
 8 2021 2003 1 1  5 0
 8 2022 2003 1 1  5 0
 9 2017 2002 1 1  1 0
 9 2018 2002 1 1  1 0
 9 2019 2002 1 1  1 0
 9 2020 2002 1 1  1 0
 9 2021 2002 1 1  1 0
 9 2022 2002 1 1  1 0
10 2017 2007 1 1 43 0
10 2018 2007 1 1 43 0
10 2019 2007 1 1 43 0
10 2020 2007 1 1 43 0
10 2021 2007 1 1 43 0
10 2022 2007 1 1 43 0
11 2017 2005 1 1 24 0
11 2018 2005 1 1 24 0
11 2019 2005 1 1 24 0
11 2020 2005 1 1 24 0
11 2021 2005 1 1 24 0
11 2022 2005 1 1 24 0
12 2017 2006 1 1 39 0
12 2018 2006 1 1 39 0
12 2019 2006 1 1 39 0
12 2020 2006 1 1 39 0
12 2021 2006 1 1 39 0
12 2022 2006 1 1 39 0
13 2017 2002 1 2 33 0
13 2018 2002 1 2 33 0
13 2019 2002 1 2 33 0
13 2020 2002 1 2 33 0
13 2021 2002 1 2 33 1
13 2022 2002 1 2 33 0
14 2017 2006 1 2 20 0
14 2018 2006 1 2 20 0
14 2019 2006 1 2 20 0
14 2020 2006 1 2 20 0
14 2021 2006 1 2 20 0
14 2022 2006 1 2 20 0
15 2017 2004 1 2 40 0
15 2018 2004 1 2 40 0
15 2019 2004 1 2 40 0
15 2020 2004 1 2 40 0
15 2021 2004 1 2 40 1
15 2022 2004 1 2 40 0
16 2017 2007 1 2 18 0
16 2018 2007 1 2 18 0
16 2019 2007 1 2 18 0
16 2020 2007 1 2 18 0
16 2021 2007 1 2 18 0
16 2022 2007 1 2 18 0
17 2017 2003 1 2 39 0
17 2018 2003 1 2 39 0
17 2019 2003 1 2 39 0
17 2020 2003 1 2 39 0
end
label values v013 V013
label def V013 1 "15-19", modify
label values v025 V025
label def V025 1 "urban", modify
label def V025 2 "rural", modify
label values v101 V101
label def V101 1 "mombasa", modify
label def V101 4 "tana river", modify
label def V101 5 "lamu", modify
label def V101 18 "nyandarua", modify
label def V101 20 "kirinyaga", modify
label def V101 24 "west pokot", modify
label def V101 25 "samburu", modify
label def V101 33 "narok", modify
label def V101 36 "bomet", modify
label def V101 37 "kakamega", modify
label def V101 39 "bungoma", modify
label def V101 40 "busia", modify
label def V101 42 "kisumu", modify
label def V101 43 "homa bay", modify

Comment

George Ford

Join Date: Aug 2014
Posts: 3182

26 Mar 2024, 18:20

Make sure this is what you want.

Code:

foreach n of varlist p2_01-p2_20 {
replace year_2017= year_2017 + (`n'==2017) if Teenage_2017==1
replace year_2018= year_2018 + (`n'==2018) if Teenage_2018==1
replace year_2019= year_2019 + (`n'==2019) if Teenage_2019==1
replace year_2020= year_2020 + (`n'==2020) if Teenage_2020==1
replace year_2021= year_2021 + (`n'==2021) if Teenage_2021==1
replace year_2022= year_2022 + (`n'==2022) if Teenage_2022==1
}

I think all the year_* variables are based only on p2_20, but I can't really tell from the data you posted.

Comment

Melyn Oluoch

Join Date: Aug 2023

Posts: 19
#10

27 Mar 2024, 01:30

George Ford Yes, too the year_* variables are based on the p2_* variables which are year of pregnancy outcome. I did not include them in the dataex since they are quite bulky. Also, I'm using the IR KDHS 2022.
Comment

George Ford

Join Date: Aug 2014
Posts: 3182

#11

27 Mar 2024, 08:33

When you see your code repeating itself, it's usually worthwhile to set up a loop.

Code:

forv yr = 2017/2022 {
    gen Teenage_`yr'=1 if v010 <=`yr'-15 & v010 >=`yr'-19
}

forv yr = 2017/2022 {
    gen year_`yr'=0
}

foreach n of varlist p2_01-p2_20 {
    forv yr = 2017/2022 {
        replace year_`yr'= year_`yr' + (`n'==`yr') if Teenage_`yr'==1
    }
}
}

Comment

Melyn Oluoch

Join Date: Aug 2023

Posts: 19
#12

02 Apr 2024, 15:21

Thanks George Ford for the valuable input. In the following command, I attempt to produce a twoway scatter graph with a connecting line for teenage pregnancies against years 2017 to 2022. Additionally, I intend to have a vertical line at year 2020 to show the year of Covid. the below commands neither produce any graph nor any error. What could be wrong?

Code:

twoway scatter TeenagePregancies outcome_year, /// connect(line) msym(O) msize(small) lcolor(gray) mcolor(gray%50) /// ytitle("Number of teenage pregnancies") xtitle("Year of pregnancy outcome") title("Teenage pregnancies by year") legend(off) graphregion(color(white)) /// xline (2020, lw(thin) lp(dash)) ///
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10262
#13

02 Apr 2024, 15:41

If you are doing this for the whole country, you need to aggregate the values. Here is one way that sums up the values.

Code:

preserve collapse (sum) TeenagePregnancies, by(outcome_year) twoway scatter TeenagePregnancies outcome_year, /// connect(line) msym(Oh) msize(small) lcolor(gray) mcolor(gray%50) /// ytitle("") xtitle("") /// title("Teenage pregnancies by year") legend(off) graphregion(color(white)) /// xline(2020, lw(thin) lp(dash)) restore

Pedantry note: It is obvious that you have year of pregnancy in the x-axis, so I would exclude the xtitle. The title and ytitle convey the same information. Pick one. Also note that my variable "TeenagePregnancies" is spelled different from yours.

Last edited by Andrew Musau; 02 Apr 2024, 15:45.
1 like
Comment

Melyn Oluoch

Join Date: Aug 2023
Posts: 19

#14

02 Apr 2024, 15:50

Hey great people!

Kindly assist me in the below. I am stuck in the month-by-month analysis of teenage births pre and post COVID-19.

Given b1_01 to b1_20 as month of birth and b2_01 to b2_20 as year of birth and v010 (respondent's year of birth), I intend to create the count of teenage births month-by-month to be able to show a twoway scatter graph and later on an event study model typical to the ones attached. In the twoway scatter for example, I would like to show the count of teenage births on the y-axis against the different months from Jan 2017 to July 2022 with only months Jan and July in a specific year indicated such that my 1st point shows Jan 2017, 2nd is July 2017, 3rd is Jan 2018 etc. And, still show the other months as the connecting dots.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input byte(b1_01 b1_02 b1_03 b1_04 b1_05) int(b2_01 b2_02 b2_03 b2_04 b2_05 v010)
 8 12 10  6  . 2021 2013 2010 2009    . 1987
12  .  .  .  . 2005    .    .    .    . 1983
10  .  .  .  . 2010    .    .    .    . 1989
 4 10 11  5  9 2019 2016 2009 2007 2002 1982
 8  2  .  .  . 2016 2012    .    .    . 1992
 6  6 11  .  . 2017 2015 2012    .    . 1993
 7  7  7  .  . 2013 2008 2008    .    . 1985
 7  5  .  .  . 2012 2009    .    .    . 1990
 7  .  .  .  . 2018    .    .    .    . 1992
 5 11  .  .  . 2006 1999    .    .    . 1982
12 12  .  .  . 2021 2019    .    .    . 1987
 8  .  .  .  . 2019    .    .    .    . 1997
 8  2  8  .  . 2014 2008 2006    .    . 1983
 .  .  .  .  .    .    .    .    .    . 2006
 3 11 11  .  . 2014 2010 1998    .    . 1981
 4  .  .  .  . 2019    .    .    .    . 1993
 6 11  .  .  . 2014 2007    .    .    . 1983
10  8  3  .  . 2015 2014 2013    .    . 1989
11  8  8  .  . 2013 2003 1996    .    . 1978
 5  .  .  .  . 2012    .    .    .    . 1990
 7  9  .  .  . 2013 2006    .    .    . 1984
 4  2  5  2  . 2020 2015 2013 2012    . 1995
10 11 12  4  . 2017 2014 2013 2011    . 1984
 .  .  .  .  .    .    .    .    .    . 1997
 5  .  .  .  . 2016    .    .    .    . 1995
 .  .  .  .  .    .    .    .    .    . 1983
 .  .  .  .  .    .    .    .    .    . 1997
 6  .  .  .  . 2018    .    .    .    . 1989
 6 12  .  .  . 2019 2014    .    .    . 1995
11  .  .  .  . 2018    .    .    .    . 1994
 9  .  .  .  . 2021    .    .    .    . 1994
 8  6  .  .  . 2015 2003    .    .    . 1989
 4  6  3  4  1 2022 2018 2016 2014 2013 1988
 .  .  .  .  .    .    .    .    .    . 2007
12  6  8  .  . 2012 2009 2004    .    . 1984
 2  3  .  .  . 2020 2013    .    .    . 1992
 6  .  .  .  . 2003    .    .    .    . 1984
 1  .  .  .  . 2019    .    .    .    . 1995
 3  3  1  .  . 2014 2014 2007    .    . 1985
 7  5  9  6  . 2021 2017 2015 2006    . 1988
 9 12  6  .  . 2011 2008 2004    .    . 1986
 1  7  .  .  . 2022 2016    .    .    . 1995
 4  9  .  .  . 2018 2015    .    .    . 1986
 4  9  7  .  . 2017 2012 2008    .    . 1986
12  4  4  .  . 2012 2010 2010    .    . 1991
 .  .  .  .  .    .    .    .    .    . 2001
12  7  .  .  . 2006 1998    .    .    . 1975
 .  .  .  .  .    .    .    .    .    . 1998
 .  .  .  .  .    .    .    .    .    . 1994
 5  6  .  .  . 2008 2003    .    .    . 1972
12 11  .  .  . 2018 2013    .    .    . 1989
 .  .  .  .  .    .    .    .    .    . 2003
 .  .  .  .  .    .    .    .    .    . 1997
 7 12  7 10  . 2020 2008 2005 2003    . 1986
 4 10  .  .  . 2020 2017    .    .    . 1997
 1  3  2 11  7 2017 2011 2009 2006 2003 1976
 7  4  .  .  . 2013 2012    .    .    . 1984
 5  5  .  .  . 2021 2020    .    .    . 1996
 1  4  4  9  . 2022 2017 2015 2012    . 1992
 6  .  .  .  . 2021    .    .    .    . 1998
10 10  7  .  . 2015 2010 2006    .    . 1984
 .  .  .  .  .    .    .    .    .    . 2006
 7  7  4  .  . 2020 2020 2017    .    . 1994
 5  .  .  .  . 2018    .    .    .    . 1997
10  4  .  .  . 2021 2018    .    .    . 1990
 4  .  .  .  . 2016    .    .    .    . 1992
 5  3  .  .  . 2017 2013    .    .    . 1993
 5  5  9  1  . 2019 2018 2014 2012    . 1989
 5  .  .  .  . 2021    .    .    .    . 1998
 6 12  .  .  . 2012 1997    .    .    . 1978
 2  .  .  .  . 2022    .    .    .    . 1998
 9  8  .  .  . 2012 2006    .    .    . 1982
 8  8  .  .  . 2016 2012    .    .    . 1994
 3  3  3  .  . 2019 2014 2009    .    . 1990
 1  .  .  .  . 2019    .    .    .    . 1996
 6  4  3  9  . 2016 2014 2012 2009    . 1992
 5  8  5  .  . 2021 2019 2013    .    . 1992
 4  .  .  .  . 2017    .    .    .    . 1994
 8  5 11 11  5 2008 2005 2003 1995 1994 1980
10  2  3  4  8 2011 2007 2005 2003 2000 1979
 5  .  .  .  . 2022    .    .    .    . 2000
 .  .  .  .  .    .    .    .    .    . 2005
 5  4  .  .  . 2018 2016    .    .    . 1995
 2  8  8  6  . 2020 2008 1998 1996    . 1975
 .  .  .  .  .    .    .    .    .    . 1996
 2  .  .  .  . 2021    .    .    .    . 1995
12  1  6  .  . 2019 2018 2016    .    . 1996
 1  2 11  .  . 2013 2012 2000    .    . 1981
 .  .  .  .  .    .    .    .    .    . 2000
10 12 11  8  . 2017 2000 1996 1992    . 1974
12 12 12  .  . 2018 2015 2015    .    . 1993
12 10 10  6  . 2018 2017 2011 2008    . 1990
 9 12  8  .  . 2020 2016 2014    .    . 1996
 6  6  7  .  . 2013 2010 2006    .    . 1988
10  6  6  9 12 2019 2013 2011 2009 2007 1987
11  .  .  .  . 2002    .    .    .    . 1981
11  .  .  .  . 2014    .    .    .    . 1997
 .  .  .  .  .    .    .    .    .    . 1996
10  .  .  .  . 2020    .    .    .    . 1992
 3  8  .  .  . 2015 2005    .    .    . 1989
end

Attached Files

Comment

Melyn Oluoch

Join Date: Aug 2023

Posts: 19
#15

02 Apr 2024, 15:56

Andrew Musau thanks for the twoway scatter graph detail. It works now! Yes, I had done already done the collapse command though just before this graph
Comment

Announcement