Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Standardizing variables with weighted data

    I am attempting to calculate several standardized variables in Stata using data from the Education Longitudinal Study (ELS:2002). ELS has a complex survey design including weights since certain sub-populations were oversampled. In the ELS codebook, the US National Center for Education Statistics provides the SAS code they used to calculate socioeconomic status (variable F1SES2), which I am using as an example of standardizing a variable using this data. I would like to create several other standardized variables in Stata, but the 'egen std()' command does not support weights for either the calculation of the mean or the variance used in the standardization process.

    Here is the type of SAS command I am having trouble replicating in Stata (copied from a portion of the NCES ELS codebook entry explaining variable F1SES2):

    /* SAS procedure PROC STANDARD is used for creating standardized
    'Z-scores'. It reads in edc_M, edc_F, occ_M, occ_F, BYINCOME, or
    Items and created said scores. Values for each variable will have a
    mean of zero and standard deviation of 1. Calculations are weighted
    using F1QWT.
    Proc Standard data=F1SES2 out=F1SES2OUT M=0 STD=1 VARDEF=weight
    Var edc_M edc_F occ_M occ_F Cdataincome Items
    Weight F1QWT
    run

    Specifically, there does not appear to be an equivalent to either "VARDEF=weight" (to specify the divisor to use for calculating the variance) or "Weight F1QWT" in Stata. Is anyone aware of a relevant technique or a user written Stata command I've missed that would help calculate standardized variables using weighted data? Thanks for any suggestions about how to approach this.


    Catherine Manly

    Doctoral student, higher education
    Education Policy, Research and Administration
    University of Massachusetts, Amherst
    [email protected]

  • #2
    Hi Catherine,
    Regarding your problem, I can suggest couple of solutions.
    The long but hands on solution:
    foreach i in varlist x1 x2 x3 x4 {
    qui sum x1 [w=wgt]
    gen double std_`i'=(x1-r(mean))/r(sd)
    }
    The short solution: Check on the command center. I dont remember if it is standard Stata program, or if its a user written program. It does allow you to include weights on the standardization.
    Hope this helps
    Fernando

    Comment


    • #3
      Catherine,

      I was going to suggest something similar to what Fernando suggested, but based on the SAS documentation at http://support.sas.com/documentation...a000146749.htm you may need something a little more complicated because I'm not sure if Stata's normal weighting procedure weights the variances the same way. The way I read it, this is what you would have to do:

      [/CODE]
      foreach i of varlist edc_M edc_F occ_M occ_F Cdataincome Items {
      qui sum `i' [weight=F1QWT]
      gen sumwt=`r(sum_w)'
      gen wtmean=`r(mean)'
      egen double CSS=total(F1QWT*(`i'-wtmean)^2)
      gen double variance=CSS/sumwt
      gen double std_`i'=(`i'-wtmean)/sqrt(variance)
      drop sumwt wtmean
      }
      [/CODE]

      The only way to be certain that this is correct, of course, is to run it in SAS as well and compare the answers. But if you could do that, you probably would have done it in SAS to begin with...

      Regards,
      Joe

      Comment


      • #4
        Thank you Fernando and Joe. Your advice is greatly appreciated. Fernando, I found the center command on ssc, although it's not clear to me from the help file that I'll be able to use it to weight the variance calculation too. However, I think I can use Joe's coding approach to check that.

        Joe, I also figured the calculation of the variance would be a bit complicated based on the SAS documentation, and your coding advice here is very helpful. I follow what you did and it appears to duplicate what the SAS documentation indicates. You are correct that I don't have SAS, but I might be able to find a place on campus to run it just long enough to confirm that this coding approach does the right thing before using it to calculate a bunch of other variables. Many thanks!!

        Comment


        • #5
          Some old threads on this:

          http://www.stata.com/statalist/archi.../msg01019.html

          http://www.stata.com/statalist/archi.../msg00332.html

          http://www.stata.com/statalist/archi.../msg00334.html


          Whenever you try a do it yourself standardization, remember that the sample used may vary from one analyses to the next.
          -------------------------------------------
          Richard Williams, Notre Dame Dept of Sociology
          Stata Version: 17.0 MP (2 processor)

          EMAIL: [email protected]
          WWW: https://www3.nd.edu/~rwilliam

          Comment


          • #6
            Hello,
            I have a similar problem in matching results in Stata, with that in SAS.
            The code provided by Joe above assumes standardization such that mean =0 and SD = 1.
            But in my situation I need to standardize (using weighted data) such that mean = 100 and SD = 15.
            I've tried a couple of approaches, but can't get it to match.
            Any suggestions?
            Regards,

            Andrew

            Comment


            • #7
              Hi all,
              I've solved my problem. Using the above example, but where mean = 100, and SD=15.

              [/CODE]
              scalar define mean_std=100
              scalar define sd_std=15
              foreach i of varlist edc_M edc_F occ_M occ_F Cdataincome Items {
              qui sum `i' [weight=F1QWT]
              gen sumwt=`r(sum_w)'
              gen wtmean=`r(mean)'
              egen double CSS=total(F1QWT*(`i'-wtmean)^2)
              gen double variance=CSS/sumwt
              gen double std_`i'=(sd_std*(`i'-wtmean)/sqrt(variance))+mean_sd
              drop sumwt wtmean
              }
              [/CODE]

              Regards,

              Andrew

              Comment

              Working...
              X