Hello,
This is my first post on Statalist after reading the forum for some time! Here is my example dataset:
This is an example from a much larger dataset, but essentially each observation represents the answer a participant gave in a questionnare. ID represents a unique individual, sections are questionnaire sections and the variables R1-4 are the answers participants gave to these questions (1 means the first answer was chosen, 2 the second answer, etc.) in order. The issue is that some participants have multiple questions answered under each section, which is a result of an error occurring in our experiment. However, I would like to replace any repeating answers to missing values if an answer exists for that question. These answers are not duplicates but rather replacing any non-missing value if one already exists in an earlier "R*" variable.
For instance, under ID: 1 Section: S, I would like to replace the value of 4 under R4 with a missing value since there was already an answer of 1 given under R1. The "R*" variables represent the page order in which a question was answered, there are hundreds of these in the main dataset, but for each section, there should only be 1 value listed per participant. ID: 3 Section: E has a similar issue, except a value is in R3 when there is already an answer recorded in R1. R1 is the first page, R2 is the second page, and there is only one question per page and one question per section.
I hope this makes sense, if not I will of course monitor my post!
This is my first post on Statalist after reading the forum for some time! Here is my example dataset:
Code:
* Example generated by -dataex-. For more info, type help dataex clear input byte ID str1 Section byte(R1 R2 R3 R4 Count) 1 "E" . 1 . . 1 1 "C" . . 2 . 1 1 "S" 1 . . 4 2 2 "E" . . . 3 1 2 "C" 2 . . . 1 2 "S" . . 3 . 1 3 "E" 3 . 3 . 2 3 "C" . 3 . . 1 3 "S" . . 3 . 1 end
For instance, under ID: 1 Section: S, I would like to replace the value of 4 under R4 with a missing value since there was already an answer of 1 given under R1. The "R*" variables represent the page order in which a question was answered, there are hundreds of these in the main dataset, but for each section, there should only be 1 value listed per participant. ID: 3 Section: E has a similar issue, except a value is in R3 when there is already an answer recorded in R1. R1 is the first page, R2 is the second page, and there is only one question per page and one question per section.
I hope this makes sense, if not I will of course monitor my post!

Comment