Test of equal proportions

Jeremy Firestone

Join Date: Feb 2019

Posts: 4
#1

Test of equal proportions

01 Feb 2019, 14:12

I am using weighted data and generating proportions of one variable, over another, e.g, after sorting VAR 2, I write the command prop VAR 1 [pweight=weight], over (VAR 2), generating proportions. I know I can use lincom or test to generate equivalency tests, but the former use a t test and the later an F test (Which generates the same p value as the t test). I understand that when comparing proportions a chi square test should be used, but this option is not available in STATA. While I can find chi square calculators on the web they seem to be more for comparing proportions in two samples rather than comparing whether, e.g. in below _subpop_1 for _prop_1 = _subpop_1 for _prop_3. Any advice would be appreciate, including any known chi square STATA ado file. Much thanks

--------------------------------------------------------------
| Logit
Over | Proportion Std. Err. [95% Conf. Interval]
-------------+------------------------------------------------
_prop_1 |
_subpop_1 | .1689706 .0432346 .1000339 .2711031
_subpop_2 | .1427851 .0647379 .0557368 .3197477
_subpop_3 | .2166859 .1076205 .0738687 .4896406
cncrnd | .1258908 .0444116 .0612615 .2411847
_subpop_5 | .1481949 .0432678 .0815762 .2541618
-------------+------------------------------------------------
_prop_2 |
_subpop_1 | .3288163 .0586031 .2254176 .4519691
_subpop_2 | .36834 .1557943 .1355549 .6843902
_subpop_3 | .4000855 .0977277 .2308118 .5971286
cncrnd | .3133561 .0618868 .2060951 .4451405
_subpop_5 | .4361276 .1044895 .2515661 .6402596
-------------+------------------------------------------------
_prop_3 |
_subpop_1 | .5022131 .0687388 .3704106 .6337087
_subpop_2 | .4888749 .134394 .2498462 .7331018
_subpop_3 | .3832286 .0870704 .2317607 .5613548
cncrnd | .5607531 .0687825 .4247001 .688249
_subpop_5 | .4156775 .0877475 .2593923 .5909859
--------------------------------------------------------------
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30172
#2

01 Feb 2019, 15:13

It is not a coincidence that the t-test generates the same p-value as the F-test. A t-statistic is just the square root of an F-statistic with 1 numerator degree of freedom. The t(df) and F(1, df) distributions are basically the same thing, because F = t² when the numerator degrees of freedom is 1.

The F statistic is, in turn related to the chi square statistic. The chi square statistic can be thought of as an F statistic with infinite denominator degrees of freedom. That is, the F-statistic provides for finite-sample inference, whereas the chi square statistic is asymptotic.

Also relevant here is the fact that the usual chi square statistic used to compare proportions is, in fact, only an approximation, and one that fails spectacularly in very small sample, but works serviceably well in moderately large samples, and is excellent in large samples. However, it is entirely inapplicable to data with pweights. So when you say "I understand that when comparing proportions a chi square test should be used," this actually does not apply in your situation.

So, to summarize, the t-test and F-test are completely equivalent in all situations, and the F-test is equivalent to a chi square test if the sample is reasonably large and the data are unweighted. With pweighted data, the usual chi square test is not valid anyway.
Comment
Jeremy Firestone

Join Date: Feb 2019

Posts: 4
#3

02 Feb 2019, 06:37

Clyde, Thank you for taking the time to respond to my query. Very much appreciated.

What N would you characterize as "reasonably" large?

Not surprisingly from above, the same issue arises when I use the svy command, which is also dependent on pweight.

But, if I were run the same proportions unweighted, should the chi square test be employed? STATA only seems to provide the option for t tests and p-tests, I am still under the impression from what I have read that chi square and proportions is used when you are comparing proportions (a) generated into two separate samples or (b) with the same sample but at two different times.

In contrast, my situation is more akin do you favor the democratic candidate, the independent candidate, or the republican candidate, and then determining whether there is a statistically significant difference between the proportion supporting the D democratic candidate and the proportion supporting the republican candidate. If the data were unweighted in such a circumstance, would the chi square test be appropriate or would you still use a t or F test?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30172
#4

02 Feb 2019, 11:05

What N would you characterize as "reasonably" large?

It's difficult to give a hard and fast answer to that as it depends somewhat on the data themselves. If the proportions you are comparing are in the middle of the zero-one range, then an N of 30 or 50 would be adequate to make these tests equivalent. On the other hand if either proportion were close to zero or one, a much larger N would be needed.

But, if I were run the same proportions unweighted, should the chi square test be employed?

Wrong question. Using the sampling weights it not optional when the data come from a non-simple random sample. Any analysis that ignores the weights is just wrong.

I am still under the impression from what I have read that chi square and proportions is used when you are comparing proportions (a) generated into two separate samples or (b) with the same sample but at two different times.

Change that impression. It is true with simple random samples (unweighted data), and provided the n's in the cells of the cross tabulation are sufficiently large (most people say > 5) -- and in real life this is the most common case. Butotherwise it is not.

If the data were unweighted in such a circumstance, would the chi square test be appropriate or would you still use a t or F test?

A chi square test will work here, and you could get one as in this example from the auto.dta

Code:

sysuse auto, clear tab rep78 foreign keep if rep78 >= 3 mlogit rep78 test _b[4:_cons] = _b[5:_cons]

In this example, the 3 retained levels of rep78 correspond to Independent, Republican and Democrat in your setting.
Comment
Jeremy Firestone

Join Date: Feb 2019

Posts: 4
#5

04 Feb 2019, 06:12

Much thanks. BTW, my statement about running the data unweighted, was not intended to mean that I would--I agree completely that running unweighted descriptive statistics is completely wrong, I just inartfully inquired about how do run an analysis on data collected by simple random sample. Best
Comment

Announcement

Test of equal proportions

Comment

Comment

Comment

Comment