Hello all,
I am trying to find the most efficient way to drop variables I do not need in an analysis. An example of the dataset is below, but the real dataset has 7596 of these seq.* variables.
I have a separate excel file with a list of the 4984 seq.* variables I do want to include in the analysis, which kind of looks like this:
seq_name
seq.19448.104
seq.5730.60
seq.18340.2
So since this excel file does not contain the seq.7186.111 variable, I would want to drop that one.
Appreciate your help and happy to clarify anything.
I am trying to find the most efficient way to drop variables I do not need in an analysis. An example of the dataset is below, but the real dataset has 7596 of these seq.* variables.
Code:
* Example generated by -dataex-. For more info, type help dataex clear input int pt_id float(seq.19448.104 seq.5730.60 seq.18340.2 seq.7186.111) 1 85 21.9 12.7 768 2 11 37.4 16 116.3 3 58.7 227.5 14.8 58.9 4 98.7 43.3 37.1 87.6 5 165.9 45.3 11.7 119.9 6 121.2 56.6 63.5 105.9 7 134.6 44.8 14.5 122.8 8 505.7 28.3 14.2 88.1 9 633.5 44.7 20.2 104.1 10 81.6 23.2 10.4 62.4 end
seq_name
seq.19448.104
seq.5730.60
seq.18340.2
So since this excel file does not contain the seq.7186.111 variable, I would want to drop that one.
Appreciate your help and happy to clarify anything.
Comment