svy: difference of proportions test vs. Excel

Shaun Genter

Join Date: Apr 2024

Posts: 4
#1

svy: difference of proportions test vs. Excel

12 Apr 2024, 11:15

Greetings,

I'm using Stata to conduct a difference of proportions test from survey data with a complex sample design. For this kind of task, I have some colleagues who prefer to use their preferred software package (Stata, SAS, R, etc.) to calculate group estimates and SEs, but then they conduct the statistical test in Excel by plugging in the group estimates and SEs into an Excel formula that uses the typical equation for a t-test. Wanting to conform to their process, I tried it out but noticed Stata and Excel produce different pooled estimates and therefore a different test statistic. I'm wondering if someone can explain the reason for the difference: what is Stata doing to calculate the test statistic? How would one adjust the t-test equation to replicate Stata's result "by hand" (i.e., using an Excel formula)?

Code:

/* example dataset (borrowed from another example on Statalist: https://www.statalist.org/forums/forum/general-stata-discussion/general/432450-survey-adjustment-for-t-tests-and-difference-in-proportions) */ sysuse auto, clear gen mkr = substr(make,1,2) /*artificial PSU */ svyset mkr [pw = turn] gen hiprice = price>8000 /* outcome */ /* test difference of proportions low-price domenstic vs. high-price domestic */ svy: tab hiprice foreign, row se matrix list e(b) lincom _b[p11]-_b[p21] /* t-test formula to mimic Excel style formula, using estimates and SEs from above tabulation */ di (.7467 - .6855)/sqrt( (.1007)^2 + (.154)^2 ) /* pooled standard error using above equation */ di sqrt( (.1007)^2 + (.154)^2 )
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30119
#2

12 Apr 2024, 12:25

The formula that Excel is using is appropriate only for independent estimates. But if you -matrix list e(V)- you will see that in fact the estimates are correlated with each other. So the correct way to calculate a pooled standard error for the difference would be

[/code]
matrix V = e(V)
display sqrt(V["p11", "p11"] + V["p21", "p21"] - 2*V["p11", "p21"])
[/code]
which does, indeed, reproduce the standard error that Stata used in -lincom-.
1 like
Comment
Shaun Genter

Join Date: Apr 2024

Posts: 4
#3

12 Apr 2024, 13:37

Thank you much, this is very helpful.
Comment
Shaun Genter

Join Date: Apr 2024

Posts: 4
#4

15 Apr 2024, 06:30

Hello, I do have a follow up question--can you point me to an online resource that discusses how to calculate the pooled standard error in the way you've done here? I was hoping to read more about it but a search using keywords like "correlated" or "dependent" mostly turns up results for paired t-tests, which seems like something different from what you're doing here and doesn't give special guidance about how to subtract the correlated variances.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30119
#5

15 Apr 2024, 11:17

Try https://stats.libretexts.org/Bookshe...ated_Variables

It is not exactly on point, as it deals with the pooled standard deviation for a pair of correlated variables. But the principle is the same. In your case you are dealing with standard errors. But those are just the estimated standard deviations of the sampling distributions of two coefficients. And their variances and covariance are given in e(V).
Comment
Shaun Genter

Join Date: Apr 2024

Posts: 4
#6

15 Apr 2024, 14:17

Once again, very helpful. Thanks for your response.
Comment

Announcement

svy: difference of proportions test vs. Excel

Comment

Comment

Comment

Comment

Comment