Combining numerical continuous variables

Mark Hoffmann

Join Date: Sep 2017

Posts: 13
#1

Combining numerical continuous variables

21 Sep 2017, 16:49

Hi,
This is my first post and I would assume an easy solution, however being very new to Stata I'm struggling. I have 5 years of survey data in which the responses to 4 questions are numeric and continuous data. I would like to combine the data from all questions into one variable which includes set ranges for each of the questions. For example, variable 1 data ranges from 0 to 90 and I want the survey responses from 0 to 4, variable 2 ranges from 0 to 90 and I want the data from 0 to 2 etc. My aim is to determine the mean value for the composite group and also the percentage of respondents who answered within the set ranges described. Thanks in advance, Mark
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30111
#2

21 Sep 2017, 16:59

I, for one, don't understand your explanation at all. I suggest you read the FAQ for good advice on how to post clear questions that are likely to draw a timely and helpful response. Specifically in this case, I think you would benefit from installing the -dataex- command, learining how to use it, and then posting an example of your Stata data. In addition to showing that, do some hand calculation of the results you want just for the small example data, and show those calculations and the desired end results. I think if you do that, you will likely get good advice.
Comment
Mark Hoffmann

Join Date: Sep 2017

Posts: 13
#3

21 Sep 2017, 17:23

OK. I'm examining 4 questions that the survey asks: Q1 how many drinks can a man drink daily without doing any harm? The respondents provide an answer between 0 & 99. Q2 asks the same but for a women. Q3 asks the same question but concerning a single occasion for a man and Q4 the same for a woman. There are 5 years of survey data that I am examining. I can determine mean values for responses for each question and for each year however I want to group the four questions together for each year and determine a mean value for the responses. I want to determine the mean values for responses that fall within a given range for each question. For example Q1 the range is between 0 and 4, Q2 is between 0 and 2 etc.
Comment
Mark Hoffmann

Join Date: Sep 2017

Posts: 13
#4

21 Sep 2017, 17:31

I attempted this command: gen OVERALL = (var1 = 0 & var1 < = 4) + (var2 > = 0 & var2 < = 2) + (var3 > = 0 &
var3 < = 6) + (var4 > = 0 & var < = 4). However I don't feel that the outcome was realistic - as in the mean value was lower than expected.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30111
#5

21 Sep 2017, 20:00

I'm afraid I still don't get what you're trying to do. Your explanations point me in several, rather different directions.

I want to group the four questions together for each year and determine a mean value for the responses.

It isn't at all clear how to do this. You could do a simple mean: -egen mean_1_to_4 = rowmean(Q1 Q2 Q3 Q4)-. But the normatively correct answers to these four questions are different, and I don't see how an average of the answers would be useful, nor what it would mean.

I want to determine the mean values for responses that fall within a given range for each question. For example Q1 the range is between 0 and 4, Q2 is between 0 and 2 etc.

So something like:

Code:

by year, sort: egen mean_q1_0_to_4 = mean(cond(inrange(Q1, 0, 4), Q1, .))

would handle the mean of those Q1 responses that fall between 0 and 4. The others would be handled analogously.

I attempted this command: gen OVERALL = (var1 = 0 & var1 < = 4) + (var2 > = 0 & var2 < = 2) + (var3 > = 0 &
var3 < = 6) + (var4 > = 0 & var < = 4).

Actually, you didn't attempt that, because it would give you a syntax error. The var1 = 0 would have to have been var1 == 0. That would basically give each observation a score equal to how many of the four questions were given answers within ranges 0 to 4, 0 to 2, 0 to 6, and 0 to 4. If those ranges are normatively correct response ranges, then this is in effect "how many were answered correctly." That's a perfectly reasonable thing to do. It isn't related to means, though. If the value was lower than expected, then either your expectations are too high, or perhaps there is a problem with the data. By the way, you can make this code easier to read and understand (and quicker to type) by changing it to:

Code:

gen OVERALL = inrange(var1, 0, 4) + inrange(var2, 0, 2) + inrange(var3, 0, 6) + inrange(var4, 0, 4)

By the way is var1 the same thing as Q1, etc.?

I don't have the sense that any of this is what you're actually looking for, but I can't seem to discern what that might be. Maybe this will give you some food for thought, I hope.
Comment
Mark Hoffmann

Join Date: Sep 2017

Posts: 13
#6

21 Sep 2017, 22:11

Thanks Clyde, this has been very helpful - particularly this code: gen OVERALL = inrange(var1, 0, 4) + inrange(var2, 0, 2) + inrange(var3, 0, 6) + inrange(var4, 0, 4). Var1 is Q1. Apologies for the lack of clarity but I have it sorted now...
Comment

Announcement

Combining numerical continuous variables

Comment

Comment

Comment

Comment

Comment