Analysis between three groups

Gabriel Reis Ferreira

Join Date: Oct 2015

Posts: 29
#1

Analysis between three groups

15 Aug 2018, 13:55

gender age hiv_aids MM
0 0 0 10.8
1 0 0 127.3867
1 0 0 28.97162
0 0 0 21.2
1 1 0 52.04571
1 1 0 37.44932
1 1 0 34.1
1 1 0 28.01327
1 0 0 19.6
1 1 0 97.23553
0 0 0 29.85625
1 0 0 49.24438
1 0 0 11.5739
1 0 0 25.6
1 1 0 24.54847
1 0 0 26.3
1 0 0 14.2
0 0 0 28.2
0 0 0 20.7
0 1 0 22.11574
0 0 0 9.1
1 1 0 73.35053
1 0 0 74.53004
0 1 0 29.6
1 0 0 67.08441
1 1 0 14.59639
1 1 1 41.4
1 1 1 31.6
1 1 1 96.27718
1 1 1 39 Age: 0 children 1 adult. Gender: 0 female 1 male. MM: molecular marker

I am trying to see if age, gender and hiv infection status have some effect in this molecular marker. I can do it with two variables:

bysort age: ranksum MM by, (gender).

But I want to test like: values of MM between gender and age among those who are hiv negative, for exemple. Among those who are HIV negative, there is diference between female children than male children.

I could not attach dta. plan, sorry.

thank you
Tags: None
Phil Bromiley

Join Date: Apr 2014

Posts: 4348
#2

16 Aug 2018, 12:09

You'll increase your chances of a helpful answer by following the FAQ on asking questions - provide Stata code in code delimiters, readable Stata output, and sample data using dataex.

Without more info, I can't be sure, but this looks like a standard 2 sample t test using groups - look at documentation for ttest. There are also regression approaches that should give the same results. You mention ranksum - are you looking for non-parametric estimators? If so, you need to say so.
1 like
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17084
#3

16 Aug 2018, 15:13

Gabriel:
it seems a job for -regress-.

Kind regards,
Carlo
(Stata 18.0 SE)
1 like
Comment
Gabriel Reis Ferreira

Join Date: Oct 2015

Posts: 29
#4

17 Aug 2018, 08:38

Phil Bromiley Yes, this variable fits in a non-parametric distribution. When I used ranksum works well, accoording expected. However I want to test a specific group. Carlo Lazzaro I believe that regress will not be helpfull for the same reason I have mentioned (anyway I have tested regress and logistic and it did not work out for gender) My goal is test among those HIV negative, inside the children group, there is difference by gender. Same for adults. All HIV positive persons are adults, therefore is not necessary test it inside this group. (Please find attached the dta format data)
Attached Files

age gender hiv mm.dta (1.4 KB, 1 view)
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 9139
#5

18 Aug 2018, 01:14

Among those who are HIV negative, there is diference between female children than male children.

Unless I am missing something, you just need to proceed in the usual Stata way of selecting groups, i.e., using the if qualifier

Code:

bys age: ranksum MM if hiv==0, by(gender)

More generally, all that ranksum does is to test whether independent samples are from populations with the same distribution (some view it as a test of medians). Therefore, with your data in #1, I can test whether the distributions differ between females aged 0 and hiv negative and males aged 1 and hiv positive (the syntax is flexible enough to allow me to do this)

Code:

ranksum MM if gender==0 & age==0 & hiv==0| gender==1 & age==1 &hiv==1, by(gender)

Just define the groups and then modify the syntax.
1 like
Comment
Gabriel Reis Ferreira

Join Date: Oct 2015

Posts: 29
#6

18 Aug 2018, 10:27

Dear Andrew Musau this is exactly what I was trying to test. Thank you very much! But I still have a problem to undertand this data. It does not make sence to me (biologically) test difference between female children hiv negative and aldult male hiv positive. I need to test children male and female (hiv negative), adults male and female (hiv negative) and adults male hiv negative and positive. I have played around with your syntax and just had problem to test bewteen adults male hiv positive and aldults male hiv negative.

Another question is: When I cut the variable mm for two groups (base on median cutoff) I found a difference bewteen adults hiv negative male and female.

(bys hiv: tab mmc gender, ex)

What is the best way to interpretate this data; transforming the variable in two groups or analying straight by ranksum?
Attached Files

age gender hiv mm.dta (1.4 KB, 1 view)

Last edited by Gabriel Reis Ferreira; 18 Aug 2018, 11:07.
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 9139

18 Aug 2018, 11:46

In future please use dataex to present data examples.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte(age gender hiv) float(mm mmc)
1 1 0 14.59639  0
0 1 0     25.6  0
0 1 0 28.97162  0
1 1 0 37.44932 30
1 1 0 73.35053 30
1 1 0     34.1 30
1 1 0 24.54847  0
0 1 0     14.2  0
0 0 0     20.7  0
1 1 0 52.04571 30
0 1 0 67.08441 30
1 0 0 22.11574  0
1 1 0 97.23553 30
0 1 0     26.3  0
0 1 0 74.53004 30
1 0 0     10.8  0
1 0 0     29.6  0
0 1 0 127.3867 30
0 0 0     28.2  0
0 0 0      9.1  0
0 1 0  11.5739  0
0 0 0 29.85625  0
1 1 0 28.01327  0
0 0 0 24.62219  0
0 1 0     19.6  0
0 1 0 49.24438 30
0 0 0     21.2  0
1 1 1       39 30
1 1 1 96.27718 30
1 1 1     31.6 30
1 1 1     41.4 30
end

It does not make sence to me (biologically) test difference between female children hiv negative and aldult male hiv positive.

My example was just an illustration. It is however sensible to use theory to decide what tests make sense and what don't.

I need to test children male and female (hiv negative), adults male and female (hiv negative) and adults male hiv negative and positive. I have played around with your syntax and just had problem to test bewteen adults male hiv positive and aldults male hiv negative.

I would first graph the data before doing the tests. In this way, I know what to expect. The syntax changes little

Code:

set scheme s1color
*children, hiv negative, by gender
graph box mm if gender==0 & age==0 & hiv==0| gender==1 & age==0 &hiv==0, by(gender)
gr save g1

*adults, hiv negative, by gender
graph box mm if gender==0 & age==1 & hiv==0| gender==1 & age==1 &hiv==0, by(gender)
gr save g2

*adults male, by hiv status 
graph box mm if gender==1 & age==1 & hiv==0| gender==1 & age==1 &hiv==1, by(hiv)
gr save g3

gr combine g1.gph g2.gph g3.gph

Click image for larger version

Name: gcombine.png
Views: 1
Size: 23.8 KB
ID: 1458734

Tests

Code:

*children, hiv negative, by gender
ranksum mm if gender==0 & age==0 & hiv==0| gender==1 & age==0 &hiv==0, by(gender)

*adults, hiv negative, by gender
ranksum mm if gender==0 & age==1 & hiv==0| gender==1 & age==1 &hiv==0, by(gender)


*adults male, by hiv status 
ranksum mm if gender==1 & age==1 & hiv==0| gender==1 & age==1 &hiv==1, by(hiv)

Code:

. ranksum mm if gender==0 & age==0 & hiv==0| gender==1 & age==0 &hiv==0, by(gender)

Two-sample Wilcoxon rank-sum (Mann-Whitney) test

      gender |      obs    rank sum    expected
-------------+---------------------------------
           0 |        6          41          51
           1 |       10          95          85
-------------+---------------------------------
    combined |       16         136         136

unadjusted variance       85.00
adjustment for ties        0.00
                     ----------
adjusted variance         85.00

Ho: mm(gender==0) = mm(gender==1)
             z =  -1.085
    Prob > |z| =   0.2781

. 
. 
. 
. *adults, hiv negative, by gender

. 
. ranksum mm if gender==0 & age==1 & hiv==0| gender==1 & age==1 &hiv==0, by(gender)

Two-sample Wilcoxon rank-sum (Mann-Whitney) test

      gender |      obs    rank sum    expected
-------------+---------------------------------
           0 |        3          10          18
           1 |        8          56          48
-------------+---------------------------------
    combined |       11          66          66

unadjusted variance       24.00
adjustment for ties        0.00
                     ----------
adjusted variance         24.00

Ho: mm(gender==0) = mm(gender==1)
             z =  -1.633
    Prob > |z| =   0.1025


. 
. *adults male, by hiv status 

. 
. ranksum mm if gender==1 & age==1 & hiv==0| gender==1 & age==1 &hiv==1, by(hiv)

Two-sample Wilcoxon rank-sum (Mann-Whitney) test

         hiv |      obs    rank sum    expected
-------------+---------------------------------
           0 |        8          48          52
           1 |        4          30          26
-------------+---------------------------------
    combined |       12          78          78

unadjusted variance       34.67
adjustment for ties        0.00
                     ----------
adjusted variance         34.67

Ho: mm(hiv==0) = mm(hiv==1)
             z =  -0.679
    Prob > |z| =   0.4969

So we cannot reject the null of no difference in the distributions across all of our comparison groups.

Another question is: When I cut the variable mm for two groups (base on median cutoff) I found a difference bewteen adults hiv negative male and female.

(bys hiv: tab mmc gender, ex)

What is the best way to interpretate this data; transforming the variable in two groups or analying straight by ranksum?

It appears that you are dichotomizing the "mm" variable here. By doing so, you are throwing away valuable information and there is some literature arguing that you should not do this. However, I have no clue what a molecular marker is and this is not my field of work, so consult your colleagues (or other forum members) to see if dichotomization makes sense here. There may be good reasons for it.

Comment

Gabriel Reis Ferreira

Join Date: Oct 2015
Posts: 29

18 Aug 2018, 18:33

Thank you Andrew Musau

I have done the graphs at first, that was the reason why I choose this groups to compare.

Code:

graph box mm, over(age) over(gender) over(hiv)

Click image for larger version

Name: gender age hiv mm.png
Views: 1
Size: 10.5 KB
ID: 1458753

If a try without group division we can find difference.

Code:

ranksum mm, by (gender)

Two-sample Wilcoxon rank-sum (Mann-Whitney) test

      gender |      obs    rank sum    expected
-------------+---------------------------------
           0 |        9          88         144
           1 |       22         408         352
-------------+---------------------------------
    combined |       31         496         496

unadjusted variance      528.00
adjustment for ties        0.00
                     ----------
adjusted variance        528.00

Ho: mm(gender==0) = mm(gender==1)
             z =  -2.437
    Prob > |z| =   0.0148

Code:

. ranksum mm if hiv==0, by (gender)

Two-sample Wilcoxon rank-sum (Mann-Whitney) test

      gender |      obs    rank sum    expected
-------------+---------------------------------
           0 |        9          88         126
           1 |       18         290         252
-------------+---------------------------------
    combined |       27         378         378

unadjusted variance      378.00
adjustment for ties        0.00
                     ----------
adjusted variance        378.00

Ho: mm(gender==0) = mm(gender==1)
             z =  -1.955
    Prob > |z| =   0.050

Thank you again for your help, I will keep chasing it.

Announcement