Replacing all but one value across variables as missing, not replacing the first non-missing value

Zach Marhanka

Join Date: Aug 2022

Posts: 1
#1

Replacing all but one value across variables as missing, not replacing the first non-missing value

31 Aug 2022, 13:22

Hello,

This is my first post on Statalist after reading the forum for some time! Here is my example dataset:

Code:

* Example generated by -dataex-. For more info, type help dataex clear input byte ID str1 Section byte(R1 R2 R3 R4 Count) 1 "E" . 1 . . 1 1 "C" . . 2 . 1 1 "S" 1 . . 4 2 2 "E" . . . 3 1 2 "C" 2 . . . 1 2 "S" . . 3 . 1 3 "E" 3 . 3 . 2 3 "C" . 3 . . 1 3 "S" . . 3 . 1 end

This is an example from a much larger dataset, but essentially each observation represents the answer a participant gave in a questionnare. ID represents a unique individual, sections are questionnaire sections and the variables R1-4 are the answers participants gave to these questions (1 means the first answer was chosen, 2 the second answer, etc.) in order. The issue is that some participants have multiple questions answered under each section, which is a result of an error occurring in our experiment. However, I would like to replace any repeating answers to missing values if an answer exists for that question. These answers are not duplicates but rather replacing any non-missing value if one already exists in an earlier "R*" variable.

For instance, under ID: 1 Section: S, I would like to replace the value of 4 under R4 with a missing value since there was already an answer of 1 given under R1. The "R*" variables represent the page order in which a question was answered, there are hundreds of these in the main dataset, but for each section, there should only be 1 value listed per participant. ID: 3 Section: E has a similar issue, except a value is in R3 when there is already an answer recorded in R1. R1 is the first page, R2 is the second page, and there is only one question per page and one question per section.

I hope this makes sense, if not I will of course monitor my post!
Tags: None
Jared Greathouse

Join Date: Sep 2021

Posts: 2172
#2

31 Aug 2022, 13:44

Uhhhhhhhhhhhhhhhhhhhhhhh based off your description, it seems to me like a reshape to long and collapse by first non-missing is necessary here........ but I don't immediately know if this is the solution.

EDIT: Thank you for giving a dataex in your first post, and welcome to Statalist!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30355
#3

31 Aug 2022, 13:47

I think I understand what you want.

Code:

reshape long R, i(ID Section) by ID Section (_j): gen response_count = sum(!missing(R)) replace R = . if response_count > 1 drop response_count reshape wide

Thank you for using -dataex- on your very first post!

Added: Crossed with #2

Last edited by Clyde Schechter; 31 Aug 2022, 13:49.
1 like
Comment

Announcement

Replacing all but one value across variables as missing, not replacing the first non-missing value

Comment

Comment