Calculating p value for two group

Aamir Malik

Join Date: Jan 2020

Posts: 48
#1

Calculating p value for two group

15 Jul 2021, 08:08

Hello,
I have a small data of two groups and I want to compare them and get the p value.

Group 1 is called Responder and includes ID 3, 7, and 10, While the second group called Non-responder includes 4, 6, 11, and 14.

Here is the data.

clear
input byte(id sex age race) str6 height float(weight bmi)
3 0 38 2 "4'11.5" 59.5 46.3
4 1 68 1 "5'8" 345 52.5
6 0 27 1 "5'6" 343.8 55.5
7 0 41 1 "5'7" 254 39.8
10 0 48 2 "5'5" 279.8 46.6
11 1 47 1 "5'9" 289 42.7
14 0 59 2 "5'3.5" 269.4 47
end
Tags: None
Justin Niakamal

Join Date: Aug 2017

Posts: 760
#2

15 Jul 2021, 08:24

It’s not clear what you want to test here, but you’ll need to create a variable for the responder group and then run a two-sample t-test -- although these tests won’t have much power with so few observations.

Code:

gen responder = (inlist(id, 3,7,10)) ttest <somevar>, by(responder)
1 like
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17711

15 Jul 2021, 08:48

Aamir:
elaborating a bit on Justine's helpful advice (and hoping that you do not really have 7 observations, otherwise any inferential procedure would make no sense at all), I would go -regress- instead of -ttest-: (in the example below I assume that your regressand is -bmi-):

Code:

input byte(id sex age race) str6 height float(weight bmi)
 3 0 38 2 "4'11.5" 59.5 46.3
 4 1 68 1 "5'8" 345 52.5
 6 0 27 1 "5'6" 343.8 55.5
 7 0 41 1 "5'7" 254 39.8
 10 0 48 2 "5'5" 279.8 46.6
 11 1 47 1 "5'9" 289 42.7
 14 0 59 2 "5'3.5" 269.4 47
 end
gen responder = (inlist(id, 3,7,10))
label define responder 0 "Non-responder" 1 "Responder"
label val responder responder
regress bmi i.sex c.age##c.age i.responder

. regress bmi i.sex c.age##c.age i.responder

      Source |       SS           df       MS      Number of obs   =         7
-------------+----------------------------------   F(4, 2)         =      2.12
       Model |  140.127928         4  35.0319819   Prob > F        =    0.3454
    Residual |  33.0720799         2  16.5360399   R-squared       =    0.8091
-------------+----------------------------------   Adj R-squared   =    0.4272
       Total |  173.200008         6  28.8666679   Root MSE        =    4.0665

------------------------------------------------------------------------------
         bmi |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       1.sex |   -1.90465   4.618166    -0.41   0.720    -21.77502    17.96572
         age |  -2.336188   1.059341    -2.21   0.158    -6.894163    2.221787
             |
 c.age#c.age |   .0244982    .011142     2.20   0.159    -.0234421    .0724384
             |
   responder |
  Responder  |  -1.338994   4.528012    -0.30   0.795    -20.82146    18.14347
       _cons |   100.1374   22.73151     4.41   0.048     2.331616    197.9432
------------------------------------------------------------------------------

.
*Standard errors are not clustered on -responder- because the number of clusters (2) is too low*

Kind regards,
Carlo
(Stata 19.0)

Comment

Aamir Malik

Join Date: Jan 2020

Posts: 48
#4

15 Jul 2021, 10:06

Thank you all.

But I want to compare age, sex, race, BMI between responder and non-responder to get a p-value.
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17711

15 Jul 2021, 10:12

Aamir:
here you are (please see example on -age-):

Code:

. ttest age, unequal by(responder)

Two-sample t test with unequal variances
------------------------------------------------------------------------------
   Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
---------+--------------------------------------------------------------------
Non-resp |       4       50.25    8.863549     17.7271    22.04223    78.45777
Responde |       3    42.33333    2.962731    5.131601    29.58573    55.08094
---------+--------------------------------------------------------------------
combined |       7    46.85714    5.124305    13.55764    34.31842    59.39587
---------+--------------------------------------------------------------------
    diff |            7.916667    9.345602               -19.07605    34.90938
------------------------------------------------------------------------------
    diff = mean(Non-resp) - mean(Responde)                        t =   0.8471
Ho: diff = 0                     Satterthwaite's degrees of freedom =  3.63968

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(T < t) = 0.7755         Pr(|T| > |t|) = 0.4490          Pr(T > t) = 0.2245

.

You can tweak the code posted above according to the variable you're interested in -ttest-ing.
Obviously, with such a scant sample size, the lack of statistical significance is expected.

Kind regards,
Carlo
(Stata 19.0)

Comment

Aamir Malik

Join Date: Jan 2020

Posts: 48
#6

15 Jul 2021, 10:29

Thank you Carlo for your help. It works for me.
Comment
Bruce Weaver

Join Date: May 2014

Posts: 1133
#7

15 Jul 2021, 11:48

Bear in mind that sex and race are categorical variables, and that in the small dataset you shared, at least, both of them are dichotomous. In other words, I doubt you want to use t-tests on those variables.

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 19.5 (Windows)
1 like
Comment
Aamir Malik

Join Date: Jan 2020

Posts: 48
#8

15 Jul 2021, 18:32

Thank you. We had a small sample size for a prospective trial and the editorial office wanted to prove with a p-value that both groups are not statistically significant so I needed to compare the two groups. Thank you for your input. very helpful.
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17711

15 Jul 2021, 23:50

Aamir:
Bruce is correct (and I forgot to mention it in my previous reply).
Provided that you do not want to go -regress-, as far as the categorical variables are concerned, you may want to consider:

Code:

. tabulate sex responder, chi2

           |       responder
       sex | Non-respo  Responder |     Total
-----------+----------------------+----------
         0 |         2          3 |         5
         1 |         2          0 |         2
-----------+----------------------+----------
     Total |         4          3 |         7

          Pearson chi2(1) =   2.1000   Pr = 0.147

. tabulate race responder, chi2

           |       responder
      race | Non-respo  Responder |     Total
-----------+----------------------+----------
         1 |         3          1 |         4
         2 |         1          2 |         3
-----------+----------------------+----------
     Total |         4          3 |         7

          Pearson chi2(1) =   1.2153   Pr = 0.270

.

That said, with such a small sample size, inferential statistics make no sense.
If you really have 7 obs., I'm under the impression that the reviewer is barking the wrong tree.

Last edited by Carlo Lazzaro; 15 Jul 2021, 23:52.

Kind regards,
Carlo
(Stata 19.0)

Comment

Rich Goldstein

Join Date: Mar 2014

Posts: 4464
#10

16 Jul 2021, 05:29

note that there is another route: use -permute- with the "enumerate" option; with a total N of 7 with groups sizes of 4 and 3 there are 35 possible combinations and this will test all 35 and your "p-value" is essentially the rank order of the observed data p-value among all possible 35 p-values; see

Code:

help permute
1 like
Comment

Announcement