Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Interpreting the result of proportion t-test

    Dear Statalist,
    I have a question about -prtest- command. There are two variables in my data: sex, hpv (like the contingent table below):
    HPV   Female Male
    Negative % p1 p2
    Positive % p3 p4
    I would like to test the significance of difference of the p% of hpv (either negative or positive) between sex (i.e. Is HPV infection different by sex?) Since the sample sizes are different according to sex, I ran a t-test for the proportion. The only thing I'm not sure when interpreting the result is that where the difference lie in? Is that it means the proportion of negative is significantly different between sex or the proportion of positive? Thank you!
    Last edited by Yue YY; 30 Oct 2018, 11:56.

  • #2
    Click image for larger version

Name:	1.jpg
Views:	1
Size:	34.9 KB
ID:	1468192

    This is the result of the prtest.

    Comment


    • #3
      Yue:
      I would say the proportion of the event coded as 1.
      As an aside, why not switching to -logit- or -logistic-?
      Code:
      logistic HPV_infection i.sex
      Kind regards,
      Carlo
      (Stata 18.0 SE)

      Comment


      • #4
        Hi Yue,

        The proportion of negative and the proportion of positive are the opposite of one another (i.e. they sum to 1 for each group). Thus, if the proportion of positive differ by group, it is the same as saying the proportion of negative differs by group. Whether the -prtest- command shows the proportion positive or proportion negative depends on how you've coded the variable. For example, if the variable HPV_status is coded as positive=1 and negative=0, then when you run:

        Code:
        prtest HPV_status, by(sex)
        You will get the proportion of those positive for HPV. You could recode the variable and try it again:

        Code:
        gen hpv2 = 1 if HPV_status==0
        replace hpv2 = 0 if HPV_status==1
        prtest hpv2, by(sex)
        You will now get the proportion negative and see the absolute magnitude of the Z-statistic and the 2-sided p-value are equivalent for both prtests. It probably makes more sense to look at the proportion positive based on content-knowledge, but you will get the same answer either way. It is only the difference between the proportions and the standard errors that determine the test statistic, and the absolute value is invariant to whether you look at the negative or positive category.
        Last edited by Matt Warkentin; 30 Oct 2018, 12:34.

        Comment


        • #5
          Originally posted by Carlo Lazzaro View Post
          Yue:
          I would say the proportion of the event coded as 1.
          As an aside, why not switching to -logit- or -logistic-?
          Code:
          logistic HPV_infection i.sex
          Thank you Carlo!

          LR is definitely better, but I haven't mastered it very well yet. Still I will start a LR later since I need to take some other variables into consideration.

          If you don't mind, I have one more question about this topic. If the variable is not binary (e.g "low" "medium" "high"), is that the only way to test the difference is LR?

          Thank you for the advice again! Very helpful!

          Best
          Yue
          Last edited by Yue YY; 30 Oct 2018, 14:01.

          Comment


          • #6
            Originally posted by Matt Warkentin View Post
            Hi Yue,

            The proportion of negative and the proportion of positive are the opposite of one another (i.e. they sum to 1 for each group). Thus, if the proportion of positive differ by group, it is the same as saying the proportion of negative differs by group. Whether the -prtest- command shows the proportion positive or proportion negative depends on how you've coded the variable. For example, if the variable HPV_status is coded as positive=1 and negative=0, then when you run:

            Code:
            prtest HPV_status, by(sex)
            You will get the proportion of those positive for HPV. You could recode the variable and try it again:

            Code:
            gen hpv2 = 1 if HPV_status==0
            replace hpv2 = 0 if HPV_status==1
            prtest hpv2, by(sex)
            You will now get the proportion negative and see the absolute magnitude of the Z-statistic and the 2-sided p-value are equivalent for both prtests. It probably makes more sense to look at the proportion positive based on content-knowledge, but you will get the same answer either way. It is only the difference between the proportions and the standard errors that determine the test statistic, and the absolute value is invariant to whether you look at the negative or positive category.
            Thank you very much Matt! It's really clear and easy to understand!

            Best
            Yue

            Comment


            • #7
              Originally posted by Yue YY View Post

              Thank you Carlo!

              LR is definitely better, but I haven't mastered it very well yet. Still I will start a LR later since I need to take some other variables into consideration.

              If you don't mind, I have one more question about this topic. If the variable is not binary (e.g "low" "medium" "high"), is that the only way to test the difference is LR?

              Thank you for the advice again! Very helpful!

              Best
              Yue
              Hi Yue,

              If you are referring to the predictor (independent) variable having several categories (as opposed to the outcome having several categories), then logistic is NOT your only option but it is a good one. It allows for simultaneous adjustment for other potentially important covariates in the model. It also naturally provides tests of significance (P-values) as well as effect sizes which have good interpretations (odds ratios).

              You could test for associations between two categorical variables with any number of categories using tests like Pearsons chi-square test, Fishers Exact test, Cochrane-Armitage test (for ordinal variables), and there are probably several more that could be used. It depends on what information you are looking for, as well whether you require the ability to perform simultaneous adjustment, where regression is a good option.

              Comment


              • #8
                Originally posted by Matt Warkentin View Post

                Hi Yue,

                If you are referring to the predictor (independent) variable having several categories (as opposed to the outcome having several categories), then logistic is NOT your only option but it is a good one. It allows for simultaneous adjustment for other potentially important covariates in the model. It also naturally provides tests of significance (P-values) as well as effect sizes which have good interpretations (odds ratios).

                You could test for associations between two categorical variables with any number of categories using tests like Pearsons chi-square test, Fishers Exact test, Cochrane-Armitage test (for ordinal variables), and there are probably several more that could be used. It depends on what information you are looking for, as well whether you require the ability to perform simultaneous adjustment, where regression is a good option.
                I got it. Thank you so much Matt!

                Best

                Yue

                Comment

                Working...
                X