Generate new variables from two or more variables

Akissi Amon

Join Date: Oct 2017

Posts: 42
#1

Generate new variables from two or more variables

23 Jul 2018, 13:12

Dear STATALIST,
I am writing you for probably a simple issue. I wish to generate 3rd variables (
d3
and
bp3
) from two 2-item measurement scales each (Scale 1: variables
d1
and
d2
/ Scale 2:
bp1
and
bp2
), and a 4
^th
variable (
i4
) from a 3-item scale (
i1
,
i2
and
i3
). The scales are 5-point Likert-scales. There are thus 5 response options. I thus wish for the new generated variables to give me the summary (frequencies) for each answer categories. This would essentially give me a single summary frequency for each response category for each scale, as opposed to having two to three summary frequencies per scale (one for each scale item). For instance, if for Scale 1 I have 10% of respondents who rated the scale item with the response category one (very true) and another 10% who chose the same answer for the second scale item (d2), I wish for the new variable generated (d3) to be the summary frequency of the response category ‘very true’ for that scale (10%). Within each scale, the total number of respondents is the same for each item. I am not sure how to proceed and if this is feasible. A sample of the dataset is below.
Thank you very much

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte(d1 d2 bp1 bp2 i1 i2 i3)
3 1 4 2 4 4 4
4 4 2 2 4 4 4
2 2 2 2 2 2 2
2 4 2 2 4 4 4
4 4 4 4 4 4 4
2 2 4 4 4 2 1
4 4 4 2 1 2 2
2 2 2 2 4 2 2
2 2 2 2 4 4 4
4 4 4 4 2 5 4
2 2 2 2 4 2 2
4 4 4 4 4 4 4
5 4 2 2 4 2 3
4 4 4 4 4 4 4
2 4 2 2 4 4 4
end
label values d1 d1
label values d2 d1
label values i1 d1
label values i2 d1
label values i3 d1
label values bp1 d1
label values bp2 d1
label def d1 2 "True", modify
label def d1 3 "Neither untrue nor true", modify
label def d1 4 "Untrue", modify
label def d1 5 "Very untrue", modify
label def d1 1 "Very true", modify
Tags: None

Akissi Amon

Join Date: Oct 2017
Posts: 42

23 Jul 2018, 13:13

I apologise, here is the dataset:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte(d1 d2 bp1 bp2 i1 i2 i3)
3 1 4 2 4 4 4
4 4 2 2 4 4 4
2 2 2 2 2 2 2
2 4 2 2 4 4 4
4 4 4 4 4 4 4
2 2 4 4 4 2 1
4 4 4 2 1 2 2
2 2 2 2 4 2 2
2 2 2 2 4 4 4
4 4 4 4 2 5 4
2 2 2 2 4 2 2
4 4 4 4 4 4 4
5 4 2 2 4 2 3
4 4 4 4 4 4 4
2 4 2 2 4 4 4
end
label values d1 d1
label values d2 d1
label values i1 d1
label values i2 d1
label values i3 d1
label values bp1 d1
label values bp2 d1
label def d1 2 "True", modify
label def d1 3 "Neither untrue nor true", modify
label def d1 4 "Untrue", modify
label def d1 5 "Very untrue", modify
label def d1 1 "Very true", modify

Comment

Carole J. Wilson

Join Date: Jan 2015

Posts: 932
#3

23 Jul 2018, 13:32

I'm not sure that I understand what you want. For example, in observation 4, d1=2 and d2=4. Based on the summary stats in this small dataset, 46.67% answered 2 for d1, and 60% answered 4 for d2. What should d3 be for observation 4?

Stata/MP 14.1 (64-bit x86-64)
Revision 19 May 2016
Win 8.1
Comment
Akissi Amon

Join Date: Oct 2017

Posts: 42
#4

23 Jul 2018, 14:53

Dear Carole Wilson,

Thank you very much for your reponse. Based on the above example, the new generated d3 variable would be (47%+60%)/2 =54%

Is this clearer? I apologise for the confusion.

Kind regards
Comment

Carole J. Wilson

Join Date: Jan 2015
Posts: 932

23 Jul 2018, 15:53

In that case, the following will get you what you want, I think. Note that the following requires installation of the -egenmore- package available on SSC (ssc install egenmore).

Code:

egen pc_d1=density(d1), percent
egen pc_d2=density(d2), percent

egen pc_i1=density(i1), percent
egen pc_i2=density(i2), percent
egen pc_i3=density(i3), percent

egen pc_bp1=density(bp1), percent
egen pc_bp2=density(bp2), percent

egen d3=rowmean(pc_d1 pc_d2)
egen i4=rowmean(pc_i1 pc_i2 pc_i3)
egen bp3=rowmean(pc_bp1 pc_bp2)

Stata/MP 14.1 (64-bit x86-64)
Revision 19 May 2016
Win 8.1

Comment

Akissi Amon

Join Date: Oct 2017

Posts: 42
#6

24 Jul 2018, 08:26

Dear Carole Wilson,

Thank you so much for this.

I just attempted the commands and, I have to say, I am a little bit confused by the numbers that I am seeing. I have attached the results to this post.

Essentially, what I wished to accomplish by generating the new variables was to be able to produce the type of results shown on the tabulation of d1 (the last input on the word document where I drew two circles). I wish to be able to say: e.g. For scale D, the total percentage of repondents who answered 'very true' for both items d1 and d2 combined was e.g. 30%; the total number of respondents who answered 'untrue' to both items d1 and d2 combined was e.g. 60%, and so forth. These summary data for each response option for each scale is what I am trying to achieve with the new generated variables. When I tabulated d3, from the command provided, I could not find the answer options anymore. The numbers under d3 comlum should go from 1 to 5, with each number representing one answer option (e.g. true, untrue, etc.). What I mean is that the response options are not to be added together.In the frequency column, the numbers should be the number of respondents who picked answer category 'true', answer category 'untrue' and so on (but for both d1 and d2 questions combined).

Is this explanation clearer? I am not sure if this is feasible for STATA.

Thank you very much.
Attached Files

Statalist-genvar3.docx (40.7 KB, 1 view)
Comment
Carole J. Wilson

Join Date: Jan 2015

Posts: 932
#7

24 Jul 2018, 10:18

So to repeat my question in #3:

I'm not sure that I understand what you want. For example, in observation 4, d1=2 and d2=4. Based on the summary stats in this small dataset, 46.67% answered 2 for d1, and 60% answered 4 for d2. What should d3 be for observation 4?

Stata/MP 14.1 (64-bit x86-64)
Revision 19 May 2016
Win 8.1
Comment
Akissi Amon

Join Date: Oct 2017

Posts: 42
#8

24 Jul 2018, 11:32

Dear Carole Wilson,

I apologise, I just realised that I had misread your statement. I need to mention again that the scales were 5-point Likert scales. This means that for each question, it would be extremely rare to have all respondents who would only pick one response option. For question d1 for instance, some respondents would answer 'true', others 'untrue', many would answer e.g. 'very true' and so forth. Regarding item d3 (to be created), it would not make sense to make a summary of respondents who answered 2 for d1 and 4 for d2 (so it would not make sense to combine them together), as 2 and 4 represent two different answer categories. I would only wish to create a single summary measure for respondents who answered 2 on both d1 and d2, 1 on both d1 and d2, 4 on both d1 and d2 and so on. This would mean that d3, i3 and bp3 would normally have 5 summary measures each for each of the 5 response options that could be used to rate questions d1 and d2, bp1 and bp2, and so on.

Is this clearer?

Thank you very much
Comment

Carole J. Wilson

Join Date: Jan 2015
Posts: 932

24 Jul 2018, 12:13

I'm still not certain, but perhaps this is what you are looking for:

Code:

gen d3=.
gen bp3=.
gen i4=.
foreach x of numlist 1/5 {
replace d3=`x' if d1==`x' & d2==`x'
replace bp3=`x' if bp1==`x' & bp2==`x'
replace i4=`x' if i1==`x' & i2==`x' & i3==`x'
}
tab1 d3 bp3 i4, mi

This loop is equivalent to typing:

Code:

replace d3=1 if d1==1 & d2==1
replace d3=2 if d1==2 & d2==2
etc.

Stata/MP 14.1 (64-bit x86-64)
Revision 19 May 2016
Win 8.1

Comment

Akissi Amon

Join Date: Oct 2017

Posts: 42
#10

25 Jul 2018, 07:25

Dear Carole Wilson,

Thank you so much for this second command, and for your help. I just tried it and the data is now comprehensible to me. I cannot believe there was such a simple way of doing this (as per your option 2).

I apologise again for all the confusion with the explanation I had given.

Kind regards
Comment

Announcement