Subsample and main sample comparison

Carla RAS

Join Date: Jan 2020

Posts: 10
#1

Subsample and main sample comparison

23 Jun 2020, 11:26

Hi.
I don't know if it is possible to do this, I have seen other similar posts, but I havent found the answer.

I have a main sample of 2000 subjects and only 500 of them were selected for some laboratory tests. However, from a main sample of 2000 subjets, 800 participants were excluded because of missing data in some outcome variables, resulting in 1200 participants.

Main sample: 2000
"Sample 1": 1200
"Subsample": 500 (included within 1200 participants)

Therefore, I will analyze some variables with 1,200 participants (sample 1) and others with 500 participants (subsample). However, I would like to compare the characteristics of the participants in sample 1 (1200) with those of the subsample (500) (although they would not meet the characteristics of independence and the number of subjects would be different). Or should I compare the main sample (2000) with the sub-sample (500)?

If there is a way to do this in stata, could you help me? Any advice would be greatly appreciated.

CR

Last edited by Carla RAS; 23 Jun 2020, 11:27. Reason: sample, sampling, STATA 15, subsample
Tags: None
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#2

23 Jun 2020, 17:59

You may create a binary - selected - variable, which would gave 500 observations in the ‘yes’ category, and compare features according to this variable. The ‘no’ category will depend on what you wish to compare: 1500 observations with missing values plus no blood samples; or 700 observations with no blood samples. Depending on the study design, the real sample would be 500 observations, the rest being part of a flow chart related to the sample acquisition.

Best regards,

Marcos
Comment
Carla RAS

Join Date: Jan 2020

Posts: 10
#3

24 Jun 2020, 03:45

Hi. Thank you so much.

I did something like this:

Code:

gen comp = . replace comp = 1 if hba1 !=. & il6 !=. & fib !=. & fr !=. replace comp = 0 if hba1 ==. replace comp = 0 if il6 ==. replace comp = 0 if fib ==. replace comp = 0 if fr ==.

1 for observations (500)
0 for missing (1200)

But, how should I proceed to compare means, i.e., among gender groups, age or physical activity in METS (chi - tab-, ttest, ranksum, kwallis)? (I'd like to get a p value in order to determine whether participants in the subsample had similar characteristics to those of the main sample (2000) or sample 1 (1200)). Besides, those 500 with blood tests are included in the sample with 1200.

Thank you in advance.

Last edited by Carla RAS; 24 Jun 2020, 04:01.
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#4

24 Jun 2020, 15:04

You can type something like:

Code:

ttest varname, by(comp) ranksum varname, by(comp) tabulate comp catvar, chi2

Logically, if the missing-group variable is binary, the Kruskal-Wallis test is not applicable.

Best regards,

Marcos
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#5

24 Jun 2020, 15:08

I was almost forgetting:the command - egen - has a good machinery to deal with the generation of variables which spot missing values, the option - rowmiss - being one of them.

Best regards,

Marcos
Comment
Carla RAS

Join Date: Jan 2020

Posts: 10
#6

24 Jun 2020, 16:20

Thank you very much, I really appreciate it.
Comment

Announcement

Subsample and main sample comparison

Comment

Comment

Comment

Comment

Comment