Simulating NHIS (National Health Interview Survey) Injury Data?

Tin-chi Lin

Join Date: Jul 2014

Posts: 9
#1

Simulating NHIS (National Health Interview Survey) Injury Data?

01 Jun 2016, 14:20

Dear Stata Forum Users,

My coworkers and I recently proposed a way to estimate the variance of injury rates for which different national complex surveys are used as input for the numerator (injuries) and denominator (exposure hours). An (epidemiological) rate is defined as dividing the number of events by exposure hours for a population in a period of time.

Now we’d like to go one step further, examining how the variance estimator performs under various conditions, e.g. different distributional assumptions for the injury outcome or different sample designs for the numerator or denominator. To this end, we’d like to mimic the designs of the surveys and generate samples for the numerator and denominator, respectively. Our major question is, how do we simulate the numerator and denominator data with complex survey design?

For example, the survey we used for the numerator is the 2010 National Health Interview Survey (NHIS). The NHIS is a multiple-stage clustering survey with stratification and oversampling. The public version of the data does not contain many variables about the sample design, but a PSU and strata variable for single-stage approximation. The public-use data also assumes these observations were drawn with replacement. Thus, the 2010 NHIS sample, based on information provided by the survey administrator, can be approximated by a stratified selection of 600 primary sampling units (PSU) from 300 strata for a sample size of 89,976.

Suppose the number of injuries follows a Poisson distribution because injury is a rare events. To simulate the injury data for the numerator, do we
Create 89,976 randomly generated Poisson observations within 600 clusters and 300 strata, or

Generate a population data first, said, 300 million randomly generated Poisson observations (300 million is about the U.S. population size in 2010). Then stratify the population into 300 segments. Then draw 600 PSU’s from the 300 strata, and in each PSU draw about 89,976/600 =150 observations. That gives us about 90,000 observations from 300 PSU’s from 600 strata.

Any help would be much appreciated –thanks.

Tin-chi
Tags: None

Announcement

Simulating NHIS (National Health Interview Survey) Injury Data?