Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating compositional variables and sample weights

    Hello everyone,

    I have a quick question about creating compositional variables, such as the share of teenage mothers in the population. Am I doing the right thing (seen below in the code) by counting the number of mothers and teenage mothers by state and year, given that I am working with repeated cross sections of data. And given that I have sample weights, where/how could I use them to get population figures from these sample numbers?

    Many thanks for your help in advance!

    Best regards
    Ola


    Code:
    //Generate variable for total number of mothers
    gen mother=1 if sex==2 & nchild>0
    replace mother=0 if (mother>= .)
    label var mother "Mother"
    egen totalmothers=total(mother), by(statefip year)
    label var totalmothers "Total mothers"
    
    // Generate variable for number of teenage mothers (contemporaneous)
    gen teenagemother=1 if nchild>0 & age>13 & age<20
    replace teenagemother = 0 if (teenagemother >= .)
    label var teenagemother "Teenage mother"
    
    // Generate share of new teenage mothers
    sort statefip year
    egen totalteenagemothers=total(teenagemother), by(statefip year)
    label var totalteenagemothers "Total teenage mothers"
    
    /// Estimate prevalence of teenage mothers (per 100 population)
    gen shareofteenagemothers=(totalteenagemothers/totalmothers)*100
    label var shareofteenagemothers "Share of teenage mothers"



  • #2
    Looks pretty good to me, although missing values in nchild would bite you (missing counts as positive).

    Code:
    //Generate variable for total number of mothers
    gen mother = sex==2 & nchild>0 if !missing(sex, nchild)
    label var mother "Mother"
    
    egen totalmothers=total(mother), by(statefip year)
    label var totalmothers "Total mothers"
    
    // Generate variable for number of teenage mothers (contemporaneous)
    gen teenagemother= nchild>0 & inrange(age, 14, 19) if !missing(nchild, age)
    label var teenagemother "Teenage mother"
    
    // Generate share of new teenage mothers
    egen totalteenagemothers=total(teenagemother), by(statefip year)
    label var totalteenagemothers "Total teenage mothers"
    
    /// Estimate prevalence of teenage mothers (per 100 population)
    gen shareofteenagemothers=(totalteenagemothers/totalmothers)*100
    label var shareofteenagemothers "Share of teenage mothers"
    On how indicators can be produced in one line, see https://www.stata.com/support/faqs/d...rue-and-false/ and (if you have access) https://www.stata-journal.com/articl...article=dm0099

    However, do you need to specify sex == 2 for the teenage mothers? You do if nchild is defined for any teenage men.
    Last edited by Nick Cox; 20 Mar 2021, 09:17.

    Comment


    • #3
      Thanks a lot Nick!

      One follow up: Would the below be a correct way to go about using sample weights (asecwt) available in the data? I have basically scaled both the numerator and denominator of my fraction by the sample weights. Or is there a cleaner way to go about this?

      Once again, thanks in advance!

      Code:
      // Generate variable for total number of mothers
      gen mother = sex==2 & nchild>0 if !missing(sex, nchild)
      label var mother "Mother"
      
      egen totalmothers=total(mother), by(statefip year)
      label var totalmothers "Total mothers"
      
      gen totalmotherspop=totalmothers*asecwt 
      label var totalmotherspop "Total mothers in the population"
      
      // Generate variable for number of teenage mothers (contemporaneous)
      gen teenagemother= sex ==2 & nchild>0 & inrange(age, 14, 19) if !missing(nchild, age)
      label var teenagemother "Teenage mother"
      
      // Generate share of new teenage mothers
      egen totalteenagemothers=total(teenagemother), by(statefip year)
      label var totalteenagemothers "Total teenage mothers
      
      gen totalteenagemotherspop=totalteenagemothers*asecwt 
      label var totalteenagemotherspop "Total teenage mothers in the population"
      
      // Estimate prevalence of teenage mothers (per 100 population)
      gen shareofteenagemothers=(totalteenagemotherspop/totalmotherspop)*100
      label var shareofteenagemothers "Share of teenage mothers"

      Comment


      • #4
        I doubt it. For example, the point of

        Code:
         
         egen totalmothers=total(mother), by(statefip year)   egen totalteenagemothers=total(teenagemother), by(statefip year)
        is to produce a constant for each statefip and year. Multiplying by weights afterwards undoes that, I think you need to explain more about the weights, which may need to be applied earlier, including what kind of weights.

        Comment


        • #5
          The weights I have are individual level weights. The weights are based on the inverse probability of selection into the sample and adjustments for the following factors: failure to obtain an interview; sampling within large sample units; the known distribution of the entire population according to age, sex, and race; over-sampling Hispanic persons; to give husbands and wives the same weight; and an additional step to provide consistency with labor force estimates from the basic survey. More information on this can be found here: https://cps.ipums.org/cps-action/var...iption_section

          So for each individual entry in the data, I assume that the value entered for 'asecwt' represents the number of people that this individual represents from the population. So my hope in using these weights, is to extrapolate from the sample to the population, particularly in the numbers of mothers, teenage mothers, etc. I am counting.

          Could you kindly recommend how I would go about applying them earlier? Apologies I am new to this topic!

          Thanks in advance for your help.


          Comment

          Working...
          X