Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Drop "duplicate" variables i.e. those who have the exact same values

    For reasons I can't really explain, I ended up with a dataset of one observation with a certain number of duplicate variables. I use the word "duplicate" here because although they do not share the same name, they have one unique id value that can be repeated across columns. I would like to have a code that keeps only one of the many duplicate variables.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str442 FW str487 FX str491 FY str442 FZ str486 GA str490 GB str441 GC
    "K2" "K3" "K3" "K3" "K4" "K4" "K4"
    end
    Please do not pay attention to the variable names. As you might guess this dataset comes from an Excel file so as soon as I get rid of these duplicates, I will rename them the right way.

    Thanks for the help,
    Hugo

  • #2
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str442 FW str487 FX str491 FY str442 FZ str486 GA str490 GB str441 GC
    "K2" "K3" "K3" "K3" "K4" "K4" "K4"
    end
    
    
    rename * data* 
    gen id = 1 
    reshape long data, i(id) j(which) string 
    bysort data (which) : keep if _n == 1 
    
    reshape wide data, i(id) j(which) string 
    
    rename data* *
    
    list 
    
         +-------------------+
         | id   FW   FX   GA |
         |-------------------|
      1. |  1   K2   K3   K4 |
         +-------------------+

    Comment

    Working...
    X