How to do a t-test using survey data

Maisha Huq

Join Date: Jun 2025

Posts: 15
#1

How to do a t-test using survey data

08 Sep 2018, 19:11

Hi there,

I am working with survey data and the svy commands, and wondering: what is the most accurate way to conduct a test of means in my scenario?

In my dataset:
The records are from five different survey rounds, where the variable, source indicates which round the data is from

I would like to compare the following two means: (i) the mean age across rounds 1-4, with (i) the mean age in round 5; the variable is called "FQ_age"

The sampling weight is round-specific and is stored in the variable "FQweight".

I did the following (see pic directly below)...
Generated a variable, source_age to group all records in rounds 1-4 under source_age=0 and all round 5 records as source_age=1

Then, did svy means over (source_age)

Then, used the test command

However, is this correct? I'm not sure right now why the standard error results are different when I do (i) svy only only the subset of round1-4 records (see directly below), versus (ii) when I do svy as shown above. I appreciate your time and help. Thank you!

Last edited by Maisha Huq; 08 Sep 2018, 19:15.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30066
#2

08 Sep 2018, 19:24

Assuming that you have correctly -svyset- your data, your first method is the correct one. The standard errors are different because when you apply an -if- condition to any Stata command, the command is executed as if the only data were those satisfying the -if- condition. But in survey data, the correct calculation of design effects requires the presence of all data in the entire survey sample, even if not all of those records directly participate in a particular calculation. In fact, the general rule is don't use -if- conditions with svy:. If you need to get a correct results from survey data that is conditioned on something, use the -subpop()- option of -svy:- itself.

In the future, please do not post screenshots to show output. They are frequently unreadable on some setups. The way to assure that what you post will be readable to everybody is to copy/paste directly from the Stata Results window or your log file and paste it into the Forum editor, surrounded by code delimiters. If you are not familiar with code delimiters, please read Forum FAQ #12 for instructions.
1 like
Comment
Maisha Huq

Join Date: Jun 2025

Posts: 15
#3

08 Sep 2018, 19:37

Hi Clyde,
Thanks so much for taking the time to respond and refer me to FAQ12. I'll post code next time instead!
Comment
Maisha Huq

Join Date: Jun 2025

Posts: 15
#4

09 Sep 2018, 12:59

Hi Clyde,

I had a follow-up question about the analytical approach above given my research goal/question. Specifically, my samples are two-stage cluster designs where rounds 1-4 are sampled from a different area than Round 5; thus, the research question is: how well does R1-R4 data predict R5 data? Given the research question, shouldn't I be comparing the means for R1-R4 and R5 by subsetting to only R1-R4 and R5, respectively, when calculating the means?

Please let me know if I can be clearer... thank you!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30066
#5

09 Sep 2018, 13:04

Well, if I understand correctly, all you need to do is create a new dichotomous variable that distinguishes rounds 1 through 4 from round 5, and then use -means- -over()- that variable. So

Code:

gen from_round_5 = 5.round svy: mean FQ_age, over(from_round_5) test [FQ_age]0 = [FQ_age]1
Comment

Announcement

How to do a t-test using survey data

Comment

Comment

Comment

Comment