Creating compositional variables and sample weights

Ola Aboukhsaiwan

Join Date: Mar 2021
Posts: 15

Creating compositional variables and sample weights

20 Mar 2021, 05:59

Hello everyone,

I have a quick question about creating compositional variables, such as the share of teenage mothers in the population. Am I doing the right thing (seen below in the code) by counting the number of mothers and teenage mothers by state and year, given that I am working with repeated cross sections of data. And given that I have sample weights, where/how could I use them to get population figures from these sample numbers?

Many thanks for your help in advance!

Best regards
Ola

Code:

//Generate variable for total number of mothers
gen mother=1 if sex==2 & nchild>0
replace mother=0 if (mother>= .)
label var mother "Mother"
egen totalmothers=total(mother), by(statefip year)
label var totalmothers "Total mothers"

// Generate variable for number of teenage mothers (contemporaneous)
gen teenagemother=1 if nchild>0 & age>13 & age<20
replace teenagemother = 0 if (teenagemother >= .)
label var teenagemother "Teenage mother"

// Generate share of new teenage mothers
sort statefip year
egen totalteenagemothers=total(teenagemother), by(statefip year)
label var totalteenagemothers "Total teenage mothers"

/// Estimate prevalence of teenage mothers (per 100 population)
gen shareofteenagemothers=(totalteenagemothers/totalmothers)*100
label var shareofteenagemothers "Share of teenage mothers"

Tags: None

Nick Cox

Join Date: Mar 2014
Posts: 35708

20 Mar 2021, 09:06

Looks pretty good to me, although missing values in nchild would bite you (missing counts as positive).

Code:

//Generate variable for total number of mothers
gen mother = sex==2 & nchild>0 if !missing(sex, nchild)
label var mother "Mother"

egen totalmothers=total(mother), by(statefip year)
label var totalmothers "Total mothers"

// Generate variable for number of teenage mothers (contemporaneous)
gen teenagemother= nchild>0 & inrange(age, 14, 19) if !missing(nchild, age)
label var teenagemother "Teenage mother"

// Generate share of new teenage mothers
egen totalteenagemothers=total(teenagemother), by(statefip year)
label var totalteenagemothers "Total teenage mothers"

/// Estimate prevalence of teenage mothers (per 100 population)
gen shareofteenagemothers=(totalteenagemothers/totalmothers)*100
label var shareofteenagemothers "Share of teenage mothers"

On how indicators can be produced in one line, see https://www.stata.com/support/faqs/d...rue-and-false/ and (if you have access) https://www.stata-journal.com/articl...article=dm0099

However, do you need to specify sex == 2 for the teenage mothers? You do if nchild is defined for any teenage men.

Last edited by Nick Cox; 20 Mar 2021, 09:17.

Comment

Ola Aboukhsaiwan

Join Date: Mar 2021
Posts: 15

21 Mar 2021, 15:09

Thanks a lot Nick!

One follow up: Would the below be a correct way to go about using sample weights (asecwt) available in the data? I have basically scaled both the numerator and denominator of my fraction by the sample weights. Or is there a cleaner way to go about this?

Once again, thanks in advance!

Code:

// Generate variable for total number of mothers
gen mother = sex==2 & nchild>0 if !missing(sex, nchild)
label var mother "Mother"

egen totalmothers=total(mother), by(statefip year)
label var totalmothers "Total mothers"

gen totalmotherspop=totalmothers*asecwt 
label var totalmotherspop "Total mothers in the population"

// Generate variable for number of teenage mothers (contemporaneous)
gen teenagemother= sex ==2 & nchild>0 & inrange(age, 14, 19) if !missing(nchild, age)
label var teenagemother "Teenage mother"

// Generate share of new teenage mothers
egen totalteenagemothers=total(teenagemother), by(statefip year)
label var totalteenagemothers "Total teenage mothers

gen totalteenagemotherspop=totalteenagemothers*asecwt 
label var totalteenagemotherspop "Total teenage mothers in the population"

// Estimate prevalence of teenage mothers (per 100 population)
gen shareofteenagemothers=(totalteenagemotherspop/totalmotherspop)*100
label var shareofteenagemothers "Share of teenage mothers"

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35708
#4

21 Mar 2021, 15:34

I doubt it. For example, the point of

Code:

egen totalmothers=total(mother), by(statefip year) egen totalteenagemothers=total(teenagemother), by(statefip year)

is to produce a constant for each statefip and year. Multiplying by weights afterwards undoes that, I think you need to explain more about the weights, which may need to be applied earlier, including what kind of weights.
Comment
Ola Aboukhsaiwan

Join Date: Mar 2021

Posts: 15
#5

22 Mar 2021, 06:40

The weights I have are individual level weights. The weights are based on the inverse probability of selection into the sample and adjustments for the following factors: failure to obtain an interview; sampling within large sample units; the known distribution of the entire population according to age, sex, and race; over-sampling Hispanic persons; to give husbands and wives the same weight; and an additional step to provide consistency with labor force estimates from the basic survey. More information on this can be found here: https://cps.ipums.org/cps-action/var...iption_section

So for each individual entry in the data, I assume that the value entered for 'asecwt' represents the number of people that this individual represents from the population. So my hope in using these weights, is to extrapolate from the sample to the population, particularly in the numbers of mothers, teenage mothers, etc. I am counting.

Could you kindly recommend how I would go about applying them earlier? Apologies I am new to this topic!

Thanks in advance for your help.
Comment

Announcement

Creating compositional variables and sample weights

Comment

Comment

Comment

Comment