Interpreting the result of proportion t-test

Yue YY

Join Date: May 2018

Posts: 41
#1

Interpreting the result of proportion t-test

30 Oct 2018, 11:51

Dear Statalist,
I have a question about -prtest- command. There are two variables in my data: sex, hpv (like the contingent table below):
HPV 　 Female Male

Negative % p1 p2

Positive % p3 p4

I would like to test the significance of difference of the p% of hpv (either negative or positive) between sex (i.e. Is HPV infection different by sex?) Since the sample sizes are different according to sex, I ran a t-test for the proportion. The only thing I'm not sure when interpreting the result is that where the difference lie in? Is that it means the proportion of negative is significantly different between sex or the proportion of positive? Thank you!

Last edited by Yue YY; 30 Oct 2018, 11:56.
Tags: None
Yue YY

Join Date: May 2018

Posts: 41
#2

30 Oct 2018, 11:54

This is the result of the prtest.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#3

30 Oct 2018, 12:27

Yue:
I would say the proportion of the event coded as 1.
As an aside, why not switching to -logit- or -logistic-?

Code:

logistic HPV_infection i.sex

Kind regards,
Carlo
(Stata 19.0)
Comment
Matt Warkentin

Join Date: May 2016

Posts: 104
#4

30 Oct 2018, 12:28

Hi Yue,

The proportion of negative and the proportion of positive are the opposite of one another (i.e. they sum to 1 for each group). Thus, if the proportion of positive differ by group, it is the same as saying the proportion of negative differs by group. Whether the -prtest- command shows the proportion positive or proportion negative depends on how you've coded the variable. For example, if the variable HPV_status is coded as positive=1 and negative=0, then when you run:

Code:

prtest HPV_status, by(sex)

You will get the proportion of those positive for HPV. You could recode the variable and try it again:

Code:

gen hpv2 = 1 if HPV_status==0 replace hpv2 = 0 if HPV_status==1 prtest hpv2, by(sex)

You will now get the proportion negative and see the absolute magnitude of the Z-statistic and the 2-sided p-value are equivalent for both prtests. It probably makes more sense to look at the proportion positive based on content-knowledge, but you will get the same answer either way. It is only the difference between the proportions and the standard errors that determine the test statistic, and the absolute value is invariant to whether you look at the negative or positive category.

Last edited by Matt Warkentin; 30 Oct 2018, 12:34.
1 like
Comment
Yue YY

Join Date: May 2018

Posts: 41
#5

30 Oct 2018, 13:56

Originally posted by Carlo Lazzaro View Post

Yue:
I would say the proportion of the event coded as 1.
As an aside, why not switching to -logit- or -logistic-?

Code:

logistic HPV_infection i.sex

Thank you Carlo!

LR is definitely better, but I haven't mastered it very well yet. Still I will start a LR later since I need to take some other variables into consideration.

If you don't mind, I have one more question about this topic. If the variable is not binary (e.g "low" "medium" "high"), is that the only way to test the difference is LR?

Thank you for the advice again! Very helpful!

Best
Yue

Last edited by Yue YY; 30 Oct 2018, 14:01.
Comment
Yue YY

Join Date: May 2018

Posts: 41
#6

30 Oct 2018, 13:58

Originally posted by Matt Warkentin View Post

Hi Yue,

The proportion of negative and the proportion of positive are the opposite of one another (i.e. they sum to 1 for each group). Thus, if the proportion of positive differ by group, it is the same as saying the proportion of negative differs by group. Whether the -prtest- command shows the proportion positive or proportion negative depends on how you've coded the variable. For example, if the variable HPV_status is coded as positive=1 and negative=0, then when you run:

Code:

prtest HPV_status, by(sex)

You will get the proportion of those positive for HPV. You could recode the variable and try it again:

Code:

gen hpv2 = 1 if HPV_status==0 replace hpv2 = 0 if HPV_status==1 prtest hpv2, by(sex)

You will now get the proportion negative and see the absolute magnitude of the Z-statistic and the 2-sided p-value are equivalent for both prtests. It probably makes more sense to look at the proportion positive based on content-knowledge, but you will get the same answer either way. It is only the difference between the proportions and the standard errors that determine the test statistic, and the absolute value is invariant to whether you look at the negative or positive category.

Thank you very much Matt! It's really clear and easy to understand!

Best
Yue
Comment
Matt Warkentin

Join Date: May 2016

Posts: 104
#7

30 Oct 2018, 15:06

Originally posted by Yue YY View Post

Thank you Carlo!

LR is definitely better, but I haven't mastered it very well yet. Still I will start a LR later since I need to take some other variables into consideration.

If you don't mind, I have one more question about this topic. If the variable is not binary (e.g "low" "medium" "high"), is that the only way to test the difference is LR?

Thank you for the advice again! Very helpful!

Best
Yue

Hi Yue,

If you are referring to the predictor (independent) variable having several categories (as opposed to the outcome having several categories), then logistic is NOT your only option but it is a good one. It allows for simultaneous adjustment for other potentially important covariates in the model. It also naturally provides tests of significance (P-values) as well as effect sizes which have good interpretations (odds ratios).

You could test for associations between two categorical variables with any number of categories using tests like Pearsons chi-square test, Fishers Exact test, Cochrane-Armitage test (for ordinal variables), and there are probably several more that could be used. It depends on what information you are looking for, as well whether you require the ability to perform simultaneous adjustment, where regression is a good option.
Comment
Yue YY

Join Date: May 2018

Posts: 41
#8

31 Oct 2018, 10:25

Originally posted by Matt Warkentin View Post

Hi Yue,

If you are referring to the predictor (independent) variable having several categories (as opposed to the outcome having several categories), then logistic is NOT your only option but it is a good one. It allows for simultaneous adjustment for other potentially important covariates in the model. It also naturally provides tests of significance (P-values) as well as effect sizes which have good interpretations (odds ratios).

You could test for associations between two categorical variables with any number of categories using tests like Pearsons chi-square test, Fishers Exact test, Cochrane-Armitage test (for ordinal variables), and there are probably several more that could be used. It depends on what information you are looking for, as well whether you require the ability to perform simultaneous adjustment, where regression is a good option.

I got it. Thank you so much Matt!

Best

Yue
Comment

HPV		Female	Male
Negative	%	p1	p2

Positive	%	p3	p4

Announcement

Interpreting the result of proportion t-test

Comment

Comment

Comment

Comment

Comment

Comment

Comment