Dear Stata Masters,
I have the following dataset
1 row = 1 survey with 26 questions. One regnum can have multiple survey.
24 questions are categorized into 4 domains:
Domain 1: f3 + f4 + f10 + f15 + f16 + f17 + f18
Domain 2: f5 + f6 + f7 + f11 + f19 + f26
Domain 3: f20 + f21 + f22
Domain 4: f8 + f9 + f12 + f13 + f14 + f23 + f24 + f25
I need to calculate the average of each domain with a condition that only 1 question could be empty. If more than 1 question is empty, then the domain must not be calculated.
Then, one survey needs to be excluded if >=20% of questions are empty (>=21 questions answered)
Does anyone could help me with these two questions?
Thank you in advance.
Vini
I have the following dataset
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input str3 regnum int date str1 regimen byte(qolno f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12 f13 f14 f15 f16 f17 f18 f19 f20 f21 f22 f23 f24 f25 f26) "1" 21641 "1" 1 4 2 5 5 3 3 3 2 2 3 5 2 3 4 4 5 4 4 2 4 4 4 5 4 4 3 "1" 21768 "1" 2 4 2 4 4 3 3 3 3 3 3 5 3 3 2 3 4 4 4 3 4 4 3 4 4 4 4 "1" 21859 "1" 3 4 2 4 4 3 3 3 2 3 3 3 1 2 2 3 3 3 3 2 4 4 2 3 3 3 2 "1" 21917 "1" 4 4 3 3 3 3 3 3 3 4 4 3 4 4 3 4 4 4 3 3 4 4 3 3 4 4 3 "1" 22014 "1" 5 3 3 4 5 4 3 3 3 3 3 5 3 2 4 1 1 4 3 4 4 4 1 3 1 1 2 "1" 22154 "1" 9 4 4 5 5 4 4 4 4 4 4 4 3 5 4 4 4 3 4 5 4 5 3 4 4 4 4 "1" 22299 "1" 9 4 4 4 5 3 4 3 3 4 3 4 3 3 3 4 3 4 4 4 4 4 3 3 3 4 4 "10" 21655 "2" 1 4 1 5 1 5 5 4 3 1 5 3 2 5 5 3 3 5 3 4 5 3 3 5 3 4 2 "10" 21755 "2" 2 5 3 3 5 5 5 4 5 4 5 2 1 5 5 2 3 4 2 5 2 3 5 5 1 3 5 "10" 21858 "2" 3 2 4 3 4 5 5 5 5 5 5 1 1 5 5 3 1 4 5 4 4 5 5 5 5 5 3 "10" 21928 "2" 4 4 4 3 3 3 5 3 2 5 4 1 1 5 3 4 1 2 3 3 5 5 5 5 3 5 3 "10" 22159 "2" 9 5 4 5 5 4 4 5 3 5 4 4 1 5 5 3 1 4 3 3 1 3 5 5 4 4 4 "100" 21794 "1" 1 3 3 2 5 1 3 4 4 1 4 4 1 2 2 5 3 1 1 3 3 1 1 1 1 4 3 "100" 21938 "1" 3 3 3 3 2 2 4 3 4 2 4 4 1 2 1 4 4 2 1 3 4 1 1 2 2 4 3 "101" 21796 "1" 1 5 3 5 1 5 5 5 5 3 5 5 2 4 1 5 1 5 5 5 5 1 5 3 4 5 4 "101" 21885 "1" 2 4 4 4 3 4 5 4 3 2 5 5 3 4 5 3 3 5 5 5 5 1 5 4 4 3 4 "101" 22041 "1" 3 5 5 3 4 4 5 4 5 4 5 4 4 5 4 3 3 4 5 5 5 1 5 4 4 4 4 "101" 22076 "1" 4 3 4 5 5 5 5 5 1 1 4 5 5 5 2 1 4 4 2 5 5 5 5 5 1 1 5 "102" 21796 "1" 1 3 3 3 1 4 4 5 5 4 5 5 1 3 3 1 4 4 3 3 5 1 3 5 5 4 5 "102" 21900 "1" 2 4 4 5 3 4 4 5 5 4 4 4 1 1 1 1 4 4 3 4 1 1 4 5 5 1 4 "102" 22027 "1" 3 3 3 2 3 2 3 5 3 3 3 4 1 3 1 2 4 2 1 4 4 3 4 2 5 3 4 "102" 22069 "1" 4 4 4 3 5 3 4 5 3 5 3 3 3 3 3 3 4 2 2 2 4 3 4 4 5 4 4 "102" 22284 "1" 9 2 4 4 4 3 2 3 4 3 2 2 2 2 2 2 3 2 2 2 2 2 3 2 2 2 4 "103" 21802 "1" 1 4 4 5 4 4 5 5 5 5 3 3 3 4 3 5 5 4 4 4 5 5 5 5 5 5 4 "103" 21931 "1" 2 4 4 3 4 4 4 3 5 4 4 4 3 5 5 5 3 3 3 4 5 4 3 5 5 4 4 "103" 22000 "1" 3 4 4 5 4 4 4 4 5 4 4 3 3 5 3 5 5 4 4 4 5 5 1 5 4 3 3 "103" 22088 "1" 4 5 5 5 5 5 5 5 5 5 4 4 3 4 3 5 5 5 5 5 5 5 5 4 5 5 5 "104" 21803 "1" 1 1 1 1 3 4 5 1 5 4 1 1 3 1 1 1 3 1 1 3 4 1 1 5 4 1 3 "104" 21936 "1" 2 3 1 4 4 3 5 3 5 5 4 4 4 3 1 3 2 5 1 3 5 1 4 4 4 1 5 "104" 22097 "1" 4 2 4 3 5 2 3 3 3 3 4 2 3 3 3 2 4 3 3 3 4 3 3 4 4 3 3 "104" 22295 "1" 9 3 5 5 5 4 5 5 5 5 5 5 5 5 5 5 5 2 2 5 5 5 5 5 5 5 5 "106" 21804 "1" 1 3 1 4 1 5 3 1 2 4 3 5 1 1 1 1 1 5 1 3 5 1 1 3 5 1 3 "106" 21928 "1" 2 3 3 4 4 2 2 2 2 4 2 1 1 2 2 3 2 3 3 3 3 3 1 3 4 2 2 "106" 22022 "1" 3 2 2 2 1 3 4 5 2 3 1 4 1 2 1 1 1 1 1 4 1 4 2 1 3 1 2 "106" 22081 "1" 4 4 4 4 4 3 4 5 3 3 4 5 3 3 3 3 3 5 5 4 3 5 3 2 3 2 3 end format %td date
24 questions are categorized into 4 domains:
Domain 1: f3 + f4 + f10 + f15 + f16 + f17 + f18
Domain 2: f5 + f6 + f7 + f11 + f19 + f26
Domain 3: f20 + f21 + f22
Domain 4: f8 + f9 + f12 + f13 + f14 + f23 + f24 + f25
I need to calculate the average of each domain with a condition that only 1 question could be empty. If more than 1 question is empty, then the domain must not be calculated.
Then, one survey needs to be excluded if >=20% of questions are empty (>=21 questions answered)
Does anyone could help me with these two questions?
Thank you in advance.
Vini


Comment