Drop "duplicate" variables i.e. those who have the exact same values

Hugo Denis

Join Date: Apr 2022

Posts: 27
#1

Drop "duplicate" variables i.e. those who have the exact same values

13 Jan 2023, 06:18

For reasons I can't really explain, I ended up with a dataset of one observation with a certain number of duplicate variables. I use the word "duplicate" here because although they do not share the same name, they have one unique id value that can be repeated across columns. I would like to have a code that keeps only one of the many duplicate variables.

Code:

* Example generated by -dataex-. For more info, type help dataex clear input str442 FW str487 FX str491 FY str442 FZ str486 GA str490 GB str441 GC "K2" "K3" "K3" "K3" "K4" "K4" "K4" end

Please do not pay attention to the variable names. As you might guess this dataset comes from an Excel file so as soon as I get rid of these duplicates, I will rename them the right way.

Thanks for the help,
Hugo
Tags: None

Nick Cox

Join Date: Mar 2014
Posts: 35667

13 Jan 2023, 06:52

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str442 FW str487 FX str491 FY str442 FZ str486 GA str490 GB str441 GC
"K2" "K3" "K3" "K3" "K4" "K4" "K4"
end


rename * data* 
gen id = 1 
reshape long data, i(id) j(which) string 
bysort data (which) : keep if _n == 1 

reshape wide data, i(id) j(which) string 

rename data* *

list 

     +-------------------+
     | id   FW   FX   GA |
     |-------------------|
  1. |  1   K2   K3   K4 |
     +-------------------+

Announcement

Drop "duplicate" variables i.e. those who have the exact same values

Comment