Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to use the Proportion test calculator for multivariate categorical variables

    Dear stata users,

    I am approaching statistics and Stata and I have a question involving proportion tests.

    I would like to study the relationship between 2 categorical variables, sex and cure, from the dta "cure2" (webuse cure2).

    In this dataset I have the variables sex [male, female] and cure [1, 0].

    When I use the command "tab sex cure, column row" I obtain the contingency table (2x2) representing the shares of males and females who did and not did the cure.

    Now, if I want to test the significance of the females who used the cure and the females who did not use the cure, I can run the following command:

    "prtesti 26 .4063 38 .5938" (where 26 and 38 are the sizes of females who used and did not use the cure respectively, and .4063 and .5938 the corresponding shares). This command provides the statistics to reject or not reject the null hypothesis (this link provides an explanation of this: https://www.youtube.com/watch?v=Fptz16CmmkM).

    My question is: suppose that the variable cure would be multivariate, having 2 options (0,1,2 : 0=no cure, 1=cure 1 and 2=cure 2). In this case, the contingency table would be 2x3, having two rows (male, female) and three columns (0,1,2).

    Now, if I want to repeat the same test of significance, the prtesti command does not allow to do so.

    For instance, after altering the variable cure (imposing randomly 0,1,2), suppose I run the same test on the following values:

    "prtesti 20 .3125 6 .938 38 .5938" (where 20 and 6 and 38 are the sizes of females who used no cure, used cure 1 and used cure 2 respectively, with their corresponding shares).

    However, stata does not allow to perform this.

    Would you know which test I could hence run in Stata to detect the relationship between multivariate categorical variables?

    Many thanks.

    Jack

  • #2
    Welcome to the Stata Forum/ Statalist.

    A chi-square test would be a nice option.
    Best regards,

    Marcos

    Comment


    • #3
      Hi Jack,

      The prtesti you are running compares if the relative frequency (%) of the groups "cured" and "not cured" are equal among females - it simply does not take into acount any frequencies of males. Note that this is not wrong, but this is answers a different question than the relashionship of two categorical variables. This could be stated as: is the frequency of outcome (cured / not cured) different according to the exposure (male / female)? It helps if you write your hypothesis:

      H0: the frequency of the outcome (cure) does not vary according to the exposure (sex)
      H1: the frequency of the outcome (cure) does vary according to the exposure (sex)

      An easier way to test for that than using prtesti (which requires manual typing), is just asking Stata to do it:

      Code:
      tab sex cure, chi
      Personally, when looking on a contingency table as the one you presented, the thing I look for is if the frequency of outcome was different according to the exposure, and this is more easily seem using the column option (assuming you specified the tab command as tab exposure outcome):

      Code:
      tab sex cure, chi col
      The frequency of cure among females was 56% and among males was 44%. Are those frequencies equal given the sample size? The chi-square answers that (chi-square p-value = 0.439), and assuming a significance level of 5%, you do not have strong enough evidence to reject the null hypothesis (that the frequency of outcome was equal in the exposure groups).

      The same rationale could be applied for variables with >2 categories:

      Code:
      clear
      webuse cure2
      gen random = runiform()
      clonevar cure2 = cure
      replace cure2 = 2 if random <.3
      tab sex cure2, chi col
      Last but not least, you could run a logistic model (as the first below) if your outcome is binary or a multinomial logistic regression (as the second below) for politomic (>2 categories) outcomes:

      Code:
      logit cure sex
      mlogit cure2 sex
      Best;

      Comment


      • #4
        Hi Igor,

        many thanks for your kind reply.

        So, if I inferred correctly, in this case the Chi squared always take into account both the two Sexes (male and female), when computing the statistics, and it is not possible to test the significance of just one single gender (male or female) against the outcome (cure), neglecting the other gender. This is what is performed here: https://www.youtube.com/watch?v=Fptz16CmmkM at 8:56, but then I assume it is not correct?

        Many thanks!

        J

        Comment


        • #5
          Hi Jack,

          It is possible, this is what the person in the video is doing. Go to 9:00. The presenter is comparing if the proportion of males that took algebra is different than the proportion of males that did not. In other words, he is testing if the proportions are the same. However, since the proportions are complimentary (among males, what is the proportion that took vs didn't took algebra), the only scenario in which the proportions would be the same is if 50% (of the males) took algebra, then the complimentary proportion (50%) would be equal to that. Since this is statistics, there is an imprecision around the proportion and this is what the video is testing at 9:00.

          What he is doing is not wrong, but it's not comparing if the frequency of outcome varies according to groups of exposure, since he is only looking in a single group of exposure. The procedure described above is different than looking at an association for 2 binary variables in a 2x2 table, which was your request on the original post ("I would like to study the relationship between 2 categorical variables"). Note that the rationale above (and done in the video in the specified mark) looks for only males. This is important. To make this clear, let's go back to your example and explicitly ask for a table for females only:

          Code:
          tab sex cure, row * this is the command you used
          tab sex cure if sex == 1, row * only females
          Note that in the command below I ask Stata to tabulate the frequency of cure according to sex but only for those observations that are sex == 1. Sex ==1 are females, so, in this code, I'm explicitly not taking into account the males. The output of the command is:

          Code:
                     |         cure
                 sex |         0          1 |     Total
          -----------+----------------------+----------
              female |        26         38 |        64 
                     |     40.63      59.38 |    100.00 
          -----------+----------------------+----------
               Total |        26         38 |        64 
                     |     40.63      59.38 |    100.00
          Despite obviously only showing females, it shows all the information you manually gathered for the prtesti command you issued. Among females, 26 were not cured (40.63%) and 38 were cured (59.38%). Those frequencies are complimentary (sum them and you get 100%), and did not considered the distribution of cure among males. This is why your prtesti command is not comparing the frequency of cure among groups of sex. The easiest way I see to do so is to run a chi-square test, which could be done with:

          Code:
          tab sex cure, chi col
          Interpretation already posted on the prior post. If you want to calculate the chi-square manually, you can look here for a guide.

          Best;

          Comment


          • #6
            Many thanks!!

            J

            Comment

            Working...
            X