Pooling data from multiple waves of weighted cross-sectional surveys

Pete Ware

Join Date: Feb 2015

Posts: 26
#1

Pooling data from multiple waves of weighted cross-sectional surveys

10 Feb 2015, 10:40

Hello all,

I am working with data from the U.S. Bureau of Justice Statistics (specifically, the School Crime Supplement to the National Crime Victimization Survey). This is a cross-sectional, nationally representative survey that is administered every other year. BJS provides weights and strata information to svyset the data for analysis (these are pweights in Stata-speak). I would like to pool these cross sections (adding a time variable) to understand trends in certain reported school crime as they are associated with student characteristics. My question is about the weights. Specifically, once I merge the data, do I need to (or how do I need to) adjust the provided sampling weights to get a valid estimate across time. Assume, for simplicity, that I am regressing reported bullying victimization and gender across years -

Bullying = alpha + beta1(gender) + beta2(year) + epsilon

Each sample member in the pooled dataset comes from one of the constituent years and has his or her own associated pweight. Do I keep the pweights the same in the pooled dataset or must they be adjusted somehow before I svyset the pooled dataset?

Thanks in advance,

Pete

Last edited by Pete Ware; 10 Feb 2015, 11:04.
Tags: None
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#2

20 Feb 2015, 18:31

Welcome to Statalist, Pete!

You can use the weights "as-is" for each data collection year. I have an observation about the svyset statement. In a single year, the stratum variable for the SCS is , V2117, labeled "PSEUDOSTRATUM CODE" and the PSU is V2118, a hal-"SECUCODE" , for "standard error computation code", I think. Although the codebook (e.g. page 7 of the 2007 codebook http://nces.ed.gov/pubs2004/2004307.pdf) says that there are three possible weights, I'll assume the PERSON WEIGHT VS137

Then for a single data collection year, the svyset statement would be:

Code:

/* SINGLE YEAR SETUP */ svyset V2118 [pw = VS137], strata(V2118)

Now, if a new sample of PSUs had been selected every year, then one would use for the strata() option a "super-stratum":

Code:

/* MULTI-YEAR SETUP: INDEPENDENT SAMPLING IN YEARS */ egen superstrat = group(year V2118) svyset V2118 [pw = VS137], strata(superstrat)

But in the NCVS (and SCS), the same PSUs are used in multiple years. (http://www.bjs.gov/content/pub/pdf/ncvstd13.pdf) Moreover, each household is in the survey for three years, so that it is possible for the same student to be interviewed in two different data collection years. Therefore observations in different years are not independently selected and the super-stratum approach is incorrect. Assuming that the PSU code is maintained across years, you should use the "single year setup" and year should be an analysis variable only.

Asides:
I hate having upper-case variable names- having to use the shift-key slows down analysis-. So, every time I encounter a study like NCVS or SCS, one of my first commands is to install renvars (findit)

Code:

renvars *, lower

I've seen colleagues who use names like V2118 unknowingly analyze the wrong variables. I strongly suggest first that you use rename to give your variables descriptive (lowercase!) names.

Last edited by Steve Samuels; 20 Feb 2015, 18:34.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment
Pete Ware

Join Date: Feb 2015

Posts: 26
#3

24 Mar 2015, 21:14

Sorry for the late reply - but thank you very much, Steven!
Comment
Oliver Elorreaga

Join Date: Apr 2015

Posts: 15
#4

11 May 2015, 19:46

Hi, I only want to say that these 3 messages sum in the best way the same discussions we had in the old stata forum, I'll put the links, to complement the topic.

- http://www.stata.com/statalist/archi.../msg00521.html
- http://www.stata.com/statalist/archi.../msg00050.html

Although now I have a new query, appears in this case:

1. Supposing we have 8 years of the same survey and naturally the sample was changed in each year
2. Each year the whole sample contain a panel part whereby we have the same "strata" in the 20% of the sample. In addition, we have attrition as usual.

The question is what is the best way to treat this case? I think would be with "super-stratum" , right?

Best regards,
Oliver
Comment

Announcement

Pooling data from multiple waves of weighted cross-sectional surveys

Comment

Comment

Comment