Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Replacing all but one value across variables as missing, not replacing the first non-missing value

    Hello,

    This is my first post on Statalist after reading the forum for some time! Here is my example dataset:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte ID str1 Section byte(R1 R2 R3 R4 Count)
    1 "E" . 1 . . 1
    1 "C" . . 2 . 1
    1 "S" 1 . . 4 2
    2 "E" . . . 3 1
    2 "C" 2 . . . 1
    2 "S" . . 3 . 1
    3 "E" 3 . 3 . 2
    3 "C" . 3 . . 1
    3 "S" . . 3 . 1
    end
    This is an example from a much larger dataset, but essentially each observation represents the answer a participant gave in a questionnare. ID represents a unique individual, sections are questionnaire sections and the variables R1-4 are the answers participants gave to these questions (1 means the first answer was chosen, 2 the second answer, etc.) in order. The issue is that some participants have multiple questions answered under each section, which is a result of an error occurring in our experiment. However, I would like to replace any repeating answers to missing values if an answer exists for that question. These answers are not duplicates but rather replacing any non-missing value if one already exists in an earlier "R*" variable.

    For instance, under ID: 1 Section: S, I would like to replace the value of 4 under R4 with a missing value since there was already an answer of 1 given under R1. The "R*" variables represent the page order in which a question was answered, there are hundreds of these in the main dataset, but for each section, there should only be 1 value listed per participant. ID: 3 Section: E has a similar issue, except a value is in R3 when there is already an answer recorded in R1. R1 is the first page, R2 is the second page, and there is only one question per page and one question per section.

    I hope this makes sense, if not I will of course monitor my post!

  • #2
    Uhhhhhhhhhhhhhhhhhhhhhhh based off your description, it seems to me like a reshape to long and collapse by first non-missing is necessary here........ but I don't immediately know if this is the solution.


    EDIT: Thank you for giving a dataex in your first post, and welcome to Statalist!

    Comment


    • #3
      I think I understand what you want.
      Code:
      reshape long R, i(ID Section)
      by ID Section (_j): gen response_count = sum(!missing(R))
      replace R = . if response_count > 1
      drop response_count
      reshape wide
      Thank you for using -dataex- on your very first post!

      Added: Crossed with #2
      Last edited by Clyde Schechter; 31 Aug 2022, 13:49.

      Comment

      Working...
      X