Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculating p value for two group

    Hello,
    I have a small data of two groups and I want to compare them and get the p value.

    Group 1 is called Responder and includes ID 3, 7, and 10, While the second group called Non-responder includes 4, 6, 11, and 14.

    Here is the data.

    clear
    input byte(id sex age race) str6 height float(weight bmi)
    3 0 38 2 "4'11.5" 59.5 46.3
    4 1 68 1 "5'8" 345 52.5
    6 0 27 1 "5'6" 343.8 55.5
    7 0 41 1 "5'7" 254 39.8
    10 0 48 2 "5'5" 279.8 46.6
    11 1 47 1 "5'9" 289 42.7
    14 0 59 2 "5'3.5" 269.4 47
    end

  • #2
    It’s not clear what you want to test here, but you’ll need to create a variable for the responder group and then run a two-sample t-test -- although these tests won’t have much power with so few observations.

    Code:
    gen responder = (inlist(id, 3,7,10))
    ttest <somevar>, by(responder)

    Comment


    • #3
      Aamir:
      elaborating a bit on Justine's helpful advice (and hoping that you do not really have 7 observations, otherwise any inferential procedure would make no sense at all), I would go -regress- instead of -ttest-: (in the example below I assume that your regressand is -bmi-):
      Code:
      input byte(id sex age race) str6 height float(weight bmi)
       3 0 38 2 "4'11.5" 59.5 46.3
       4 1 68 1 "5'8" 345 52.5
       6 0 27 1 "5'6" 343.8 55.5
       7 0 41 1 "5'7" 254 39.8
       10 0 48 2 "5'5" 279.8 46.6
       11 1 47 1 "5'9" 289 42.7
       14 0 59 2 "5'3.5" 269.4 47
       end
      gen responder = (inlist(id, 3,7,10))
      label define responder 0 "Non-responder" 1 "Responder"
      label val responder responder
      regress bmi i.sex c.age##c.age i.responder
      
      . regress bmi i.sex c.age##c.age i.responder
      
            Source |       SS           df       MS      Number of obs   =         7
      -------------+----------------------------------   F(4, 2)         =      2.12
             Model |  140.127928         4  35.0319819   Prob > F        =    0.3454
          Residual |  33.0720799         2  16.5360399   R-squared       =    0.8091
      -------------+----------------------------------   Adj R-squared   =    0.4272
             Total |  173.200008         6  28.8666679   Root MSE        =    4.0665
      
      ------------------------------------------------------------------------------
               bmi |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
             1.sex |   -1.90465   4.618166    -0.41   0.720    -21.77502    17.96572
               age |  -2.336188   1.059341    -2.21   0.158    -6.894163    2.221787
                   |
       c.age#c.age |   .0244982    .011142     2.20   0.159    -.0234421    .0724384
                   |
         responder |
        Responder  |  -1.338994   4.528012    -0.30   0.795    -20.82146    18.14347
             _cons |   100.1374   22.73151     4.41   0.048     2.331616    197.9432
      ------------------------------------------------------------------------------
      
      .
      *Standard errors are not clustered on -responder- because the number of clusters (2) is too low*
      Kind regards,
      Carlo
      (Stata 19.0)

      Comment


      • #4
        Thank you all.

        But I want to compare age, sex, race, BMI between responder and non-responder to get a p-value.

        Comment


        • #5
          Aamir:
          here you are (please see example on -age-):
          Code:
          . ttest age, unequal by(responder)
          
          Two-sample t test with unequal variances
          ------------------------------------------------------------------------------
             Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
          ---------+--------------------------------------------------------------------
          Non-resp |       4       50.25    8.863549     17.7271    22.04223    78.45777
          Responde |       3    42.33333    2.962731    5.131601    29.58573    55.08094
          ---------+--------------------------------------------------------------------
          combined |       7    46.85714    5.124305    13.55764    34.31842    59.39587
          ---------+--------------------------------------------------------------------
              diff |            7.916667    9.345602               -19.07605    34.90938
          ------------------------------------------------------------------------------
              diff = mean(Non-resp) - mean(Responde)                        t =   0.8471
          Ho: diff = 0                     Satterthwaite's degrees of freedom =  3.63968
          
              Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
           Pr(T < t) = 0.7755         Pr(|T| > |t|) = 0.4490          Pr(T > t) = 0.2245
          
          .
          You can tweak the code posted above according to the variable you're interested in -ttest-ing.
          Obviously, with such a scant sample size, the lack of statistical significance is expected.
          Kind regards,
          Carlo
          (Stata 19.0)

          Comment


          • #6
            Thank you Carlo for your help. It works for me.

            Comment


            • #7
              Bear in mind that sex and race are categorical variables, and that in the small dataset you shared, at least, both of them are dichotomous. In other words, I doubt you want to use t-tests on those variables.
              --
              Bruce Weaver
              Email: [email protected]
              Version: Stata/MP 18.5 (Windows)

              Comment


              • #8
                Thank you. We had a small sample size for a prospective trial and the editorial office wanted to prove with a p-value that both groups are not statistically significant so I needed to compare the two groups. Thank you for your input. very helpful.

                Comment


                • #9
                  Aamir:
                  Bruce is correct (and I forgot to mention it in my previous reply).
                  Provided that you do not want to go -regress-, as far as the categorical variables are concerned, you may want to consider:
                  Code:
                  . tabulate sex responder, chi2
                  
                             |       responder
                         sex | Non-respo  Responder |     Total
                  -----------+----------------------+----------
                           0 |         2          3 |         5
                           1 |         2          0 |         2
                  -----------+----------------------+----------
                       Total |         4          3 |         7
                  
                            Pearson chi2(1) =   2.1000   Pr = 0.147
                  
                  . tabulate race responder, chi2
                  
                             |       responder
                        race | Non-respo  Responder |     Total
                  -----------+----------------------+----------
                           1 |         3          1 |         4
                           2 |         1          2 |         3
                  -----------+----------------------+----------
                       Total |         4          3 |         7
                  
                            Pearson chi2(1) =   1.2153   Pr = 0.270
                  
                  .
                  That said, with such a small sample size, inferential statistics make no sense.
                  If you really have 7 obs., I'm under the impression that the reviewer is barking the wrong tree.
                  Last edited by Carlo Lazzaro; 15 Jul 2021, 23:52.
                  Kind regards,
                  Carlo
                  (Stata 19.0)

                  Comment


                  • #10
                    note that there is another route: use -permute- with the "enumerate" option; with a total N of 7 with groups sizes of 4 and 3 there are 35 possible combinations and this will test all 35 and your "p-value" is essentially the rank order of the observed data p-value among all possible 35 p-values; see
                    Code:
                    help permute

                    Comment

                    Working...
                    X