Hello
I'm working on a big survey dataset and i'm looking for a way of generating specific datapoints based on the information in other variables. This post is sort of continuing on the following posts: https://www.statalist.org/forums/for...riable-problem . Which i got amazing help in regards to how to manipulate a time variable.
Anyways,
For example, i have a variable that is showing the time of day a respondent finished answering a survey called "hour" --> if the respondent finishes the survey between 00:00 and 00:59 = 1, 01:00 - 01:59 = 2,..., 23:00 - 23:59 = 24.
Furthermore, the dataset contains various categorical variables that show what the respondents have answered on the different questions in the survey.
For example, "How positive or negative are you in regards to what you have just seen?" 1 = very negative, 2 = negative, 3 = neither positive or negative, 4 = positive, 5 = very positive.
Now, what I would like to do is to take the average score between the different time intervals such that that "score" is shown through all the observations of that time interval
For example: (q1_average_scores is the variable i want to make)
As mentioned, the dataset is quite large and have 10's of thousands more rows. Doing this manually for 24 time periods is a possibility (but, not really), and if I were to do it based on the above example i believe the code would be:
generate q1_average_scores = 0
sum q1_scores in 1/2 ---> Which would give me the average score of ID 1 and 2 = 3.5
replace q1_average_scores = 3.5 in 1/2
then
sum q1_scores in 3/4 --> average = 3
replace q1_average_scores = 3 in 3/4
The problem, of course, is that with 15 more variables and 24 time intervals the job is tedious and also obviously extremely prone to human error, which i would like to avoid. My question therefore is: is there a way to get the same result using another code that would cross-reference the average value of q1_scores and the respective time interval in "hour" and then put the average score in "q1_average_scores"? I dont know if i'm even explaining myself adequately, but hopefully that is understandable.
Thanks in advance for any help!
Tor
I'm working on a big survey dataset and i'm looking for a way of generating specific datapoints based on the information in other variables. This post is sort of continuing on the following posts: https://www.statalist.org/forums/for...riable-problem . Which i got amazing help in regards to how to manipulate a time variable.
Anyways,
For example, i have a variable that is showing the time of day a respondent finished answering a survey called "hour" --> if the respondent finishes the survey between 00:00 and 00:59 = 1, 01:00 - 01:59 = 2,..., 23:00 - 23:59 = 24.
Furthermore, the dataset contains various categorical variables that show what the respondents have answered on the different questions in the survey.
For example, "How positive or negative are you in regards to what you have just seen?" 1 = very negative, 2 = negative, 3 = neither positive or negative, 4 = positive, 5 = very positive.
Now, what I would like to do is to take the average score between the different time intervals such that that "score" is shown through all the observations of that time interval
For example: (q1_average_scores is the variable i want to make)
ID | q1_scores | hour | q1_average_scores |
1 | 3 | 1 | 3.5 |
2 | 4 | 1 | 3.5 |
3 | 5 | 2 | 3 |
4 | 1 | 2 | 3 |
As mentioned, the dataset is quite large and have 10's of thousands more rows. Doing this manually for 24 time periods is a possibility (but, not really), and if I were to do it based on the above example i believe the code would be:
generate q1_average_scores = 0
sum q1_scores in 1/2 ---> Which would give me the average score of ID 1 and 2 = 3.5
replace q1_average_scores = 3.5 in 1/2
then
sum q1_scores in 3/4 --> average = 3
replace q1_average_scores = 3 in 3/4
The problem, of course, is that with 15 more variables and 24 time intervals the job is tedious and also obviously extremely prone to human error, which i would like to avoid. My question therefore is: is there a way to get the same result using another code that would cross-reference the average value of q1_scores and the respective time interval in "hour" and then put the average score in "q1_average_scores"? I dont know if i'm even explaining myself adequately, but hopefully that is understandable.
Thanks in advance for any help!
Tor
Comment