Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Identify newest version of a variable

    Hi Statalist,

    I have a dataset that contains a lot of the same variables in different versions due to various data preparation steps. The logic behind the renaming during the data preparation was:
    • if the variable was edited, it has the suffix _e
    • if the variable was edited more than one time, a number is added to _e (for example variable F06_02 is the original, F06_02_e was edited one time and F06_02_e2 is the final version of the variable).
    Some variables weren't edited at all (e.g. lfd), some variables were edited once (e.g.F06_01), some variables were edited more often (up to 4 times) (see the dataex-sample below).

    Is there an easy way to identify the newest version of a variable and only keep that one in the dataset that does not involve manually checking each variable for new versions as my dataset includes more than 1000 variables?
    Of course, I will keep the original dataset but I want to reduce the number of variables in the dataset I mainly work with to minimize space and also confusion about the variable versions.

    So far, I have not found a workable solution because I was running into all kinds of problems and now I am out of ideas. I tried creating and modifying varlists (one problem was that string variables were not accepted), looping lookfor/findname over global varlists that only contained the unedited variable versions etc.


    All the best,
    Judith


    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input double lfd long(F06_01 F06_01_e F06_02 F06_02_e F06_02_e2) str27 F24_A_1 int(F24_A_1_e F24_A_1_e2 F24_A_1_e3) str8 F28_A_1 int(F28_A_1_e F28_A_1_e2 F28_A_1_e3 F28_A_1_e4 F28_A_1_e4_ofilt)
    10000001  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000004  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000007  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000008  0  0  .  .  . "9999" . . . ""      .  .  .  .  .
    10000009  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000010  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000012  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000014  0  0  .  .  . ""     . . 0 "9999"  .  .  .  .  0
    10000015  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000018  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000027  1  1  0  0  0 ""     . . 0 ""      .  .  .  .  0
    10000028  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000040  1  1  0  0  0 ""     . . 0 ""      .  .  .  .  0
    10000043  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000044  1  1  1  1  1 ""     . . 0 ""      .  .  .  .  0
    10000047  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000048  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000051  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000053  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000062  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000074  1  1  1  1  1 ""     . . 0 ""      .  .  .  .  0
    10000076  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000078  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000080  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000081  1  1  1  1  1 ""     . . 0 ""      .  .  .  .  0
    10000082  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000083  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000091  1  1  0  0  0 ""     . . 0 ""      .  .  .  .  0
    10000093  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000094  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000095  1  1  1  1  1 ""     . . 0 ""      .  .  .  .  0
    10000097  0 .b  . .b .b ""     . . 0 ""      .  .  .  .  0
    10000098  1  1  1  1  1 ""     . . 0 ""      .  .  .  .  0
    10000100  0  0  .  .  . "9999" . . . "9999"  .  .  .  .  .
    10000102  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000103  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000105  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000109  1  1  1  1  1 ""     . . 0 ""      .  .  .  .  0
    10000112  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000113  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000115  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000117  0  0  .  .  . "9999" . . . ""      .  .  .  .  .
    10000118  0 .b  . .b .b ""     . . 0 ""      .  .  .  .  0
    10000127  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000130  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000133  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000136  1  1  1  1  1 ""     . . 0 "9999"  .  .  .  .  0
    10000137  1  1  1  1  1 ""     . . 0 ""      .  .  .  .  0
    10000138  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000139  1  1  0  0  0 ""     . . 0 "9999"  .  .  .  .  0
    10000140  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000141  1  1  1  1  1 ""     . . 0 ""      .  .  .  .  0
    10000143  1  1  1  1  1 ""     . . 0 ""      .  .  .  .  0
    10000144  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000145  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000146  1  1  1  1  1 ""     . . 0 ""      .  .  .  .  0
    10000149  1  1  1  1  1 ""     . . 0 "9999"  .  .  .  .  0
    10000155  0  0  .  .  . ""     . . 0 "9999"  .  .  .  .  0
    10000158 .e .e .e .e  . ""     . . 0 "9999"  .  .  .  .  0
    10000160  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000162  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000164  1  1  1  1  1 ""     . . 0 ""      .  .  .  .  0
    10000166  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000169  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000174  0  0  .  .  . ""     . . 0 "9999"  .  .  .  .  0
    10000177  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000182  1  1  1  1  1 ""     . . 0 ""      .  .  .  .  0
    10000185 .e .e .e .e  . ""     . . 0 ""     .b .b .b .b .b
    10000186  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000187  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000189  1  1  1  1  1 ""     . . 0 ""      .  .  .  .  0
    10000190  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000194  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000196  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000197  1  1  1  1  1 ""     . . 0 ""      .  .  .  .  0
    10000200  1  1  0  0  0 ""     . . 0 ""      .  .  .  .  0
    10000202  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000204  1  1  1  1  1 ""     . . 0 ""      .  .  .  .  0
    10000205  1  1  1  1  1 "9999" . . . "9999"  .  .  .  .  .
    10000213  1  1  1  1  1 ""     . . 0 ""      .  .  .  .  0
    10000215  1  1  1  1  1 ""     . . 0 ""      .  .  .  .  0
    10000217  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000222  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000224  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000225  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000227  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000228  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000229  1  1  1  1  1 ""     . . 0 ""      .  .  .  .  0
    10000233  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000235  1  1  1  1  1 ""     . . 0 ""      .  .  .  .  0
    10000238  1  1  0  0  0 ""     . . 0 ""      .  .  .  .  0
    10000239  1  1 .e .e .e ""     . . 0 ""      .  .  .  .  0
    10000241  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000242  1  1  1  1  1 ""     . . 0 ""      .  .  .  .  0
    10000245  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000246  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000248  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000249  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000258  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000261  1  1  1  1  1 ""     . . 0 ""      .  .  .  0  0
    end
    label values lfd misslab
    label values F06_01 F06_01
    label def F06_01 0 "0. Nein, kein Kontakt mit Polizei", modify
    label def F06_01 1 "1. Ja, Kontakt mit Polizei", modify
    label def F06_01 .e "keine Angabe", modify
    label values F06_01_e F06_01_e
    label def F06_01_e 0 "0. Nein, ich hatte in diesem Zeitraum keinen Kontakt mit der Polizei.", modify
    label def F06_01_e 1 "1. Ja, ich hatte in diesem Zeitraum Kontakt mit der Polizei.", modify
    label def F06_01_e .b "unklar", modify
    label def F06_01_e .e "keine Angabe", modify
    label values F06_02 F06_02
    label def F06_02 0 "0. Nein, anderes Bundesland", modify
    label def F06_02 1 "1. Ja, in meinem Bundesland", modify
    label def F06_02 .e "keine Angabe", modify
    label values F06_02_e F06_02_e
    label values F06_02_e2 F06_02_e
    label def F06_02_e 0 "0. Nein, in einem anderen Bundesland.", modify
    label def F06_02_e 1 "1. Ja, in meinem Bundesland.", modify
    label def F06_02_e .b "unklar", modify
    label def F06_02_e .e "keine Angabe", modify
    label values F24_A_1_e F24_A_1_e
    label values F24_A_1_e2 F24_A_1_e
    label values F24_A_1_e3 F24_A_1_e
    label values F28_A_1_e F28_A_1_e
    label values F28_A_1_e2 F28_A_1_e
    label values F28_A_1_e3 F28_A_1_e
    label values F28_A_1_e4 F28_A_1_e
    label values F28_A_1_e4_ofilt F28_A_1_e
    label def F28_A_1_e .b "unklar", modify

  • #2
    Working under the assumption that new variables have the suffix _e - _e4, you can modify something along the lines of the following. Your dataset has the variable "F28_A_1_e4_ofilt" which destroys this pattern. I drop the variable and rename the variables ending with _e such that they end with _e1.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input double lfd long(F06_01 F06_01_e F06_02 F06_02_e F06_02_e2) str27 F24_A_1 int(F24_A_1_e F24_A_1_e2 F24_A_1_e3) str8 F28_A_1 int(F28_A_1_e F28_A_1_e2 F28_A_1_e3 F28_A_1_e4 F28_A_1_e4_ofilt)
    10000001  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000004  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000007  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000008  0  0  .  .  . "9999" . . . ""      .  .  .  .  .
    10000009  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000010  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000012  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000014  0  0  .  .  . ""     . . 0 "9999"  .  .  .  .  0
    10000015  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000018  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000027  1  1  0  0  0 ""     . . 0 ""      .  .  .  .  0
    10000028  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000040  1  1  0  0  0 ""     . . 0 ""      .  .  .  .  0
    10000043  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000044  1  1  1  1  1 ""     . . 0 ""      .  .  .  .  0
    10000047  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000048  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000051  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000053  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000062  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000074  1  1  1  1  1 ""     . . 0 ""      .  .  .  .  0
    10000076  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000078  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000080  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000081  1  1  1  1  1 ""     . . 0 ""      .  .  .  .  0
    10000082  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000083  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000091  1  1  0  0  0 ""     . . 0 ""      .  .  .  .  0
    10000093  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000094  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000095  1  1  1  1  1 ""     . . 0 ""      .  .  .  .  0
    10000097  0 .b  . .b .b ""     . . 0 ""      .  .  .  .  0
    10000098  1  1  1  1  1 ""     . . 0 ""      .  .  .  .  0
    10000100  0  0  .  .  . "9999" . . . "9999"  .  .  .  .  .
    10000102  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000103  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000105  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000109  1  1  1  1  1 ""     . . 0 ""      .  .  .  .  0
    10000112  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000113  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000115  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000117  0  0  .  .  . "9999" . . . ""      .  .  .  .  .
    10000118  0 .b  . .b .b ""     . . 0 ""      .  .  .  .  0
    10000127  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000130  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000133  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000136  1  1  1  1  1 ""     . . 0 "9999"  .  .  .  .  0
    10000137  1  1  1  1  1 ""     . . 0 ""      .  .  .  .  0
    10000138  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000139  1  1  0  0  0 ""     . . 0 "9999"  .  .  .  .  0
    10000140  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000141  1  1  1  1  1 ""     . . 0 ""      .  .  .  .  0
    10000143  1  1  1  1  1 ""     . . 0 ""      .  .  .  .  0
    10000144  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000145  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000146  1  1  1  1  1 ""     . . 0 ""      .  .  .  .  0
    10000149  1  1  1  1  1 ""     . . 0 "9999"  .  .  .  .  0
    10000155  0  0  .  .  . ""     . . 0 "9999"  .  .  .  .  0
    10000158 .e .e .e .e  . ""     . . 0 "9999"  .  .  .  .  0
    10000160  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000162  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000164  1  1  1  1  1 ""     . . 0 ""      .  .  .  .  0
    10000166  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000169  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000174  0  0  .  .  . ""     . . 0 "9999"  .  .  .  .  0
    10000177  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000182  1  1  1  1  1 ""     . . 0 ""      .  .  .  .  0
    10000185 .e .e .e .e  . ""     . . 0 ""     .b .b .b .b .b
    10000186  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000187  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000189  1  1  1  1  1 ""     . . 0 ""      .  .  .  .  0
    10000190  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000194  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000196  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000197  1  1  1  1  1 ""     . . 0 ""      .  .  .  .  0
    10000200  1  1  0  0  0 ""     . . 0 ""      .  .  .  .  0
    10000202  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000204  1  1  1  1  1 ""     . . 0 ""      .  .  .  .  0
    10000205  1  1  1  1  1 "9999" . . . "9999"  .  .  .  .  .
    10000213  1  1  1  1  1 ""     . . 0 ""      .  .  .  .  0
    10000215  1  1  1  1  1 ""     . . 0 ""      .  .  .  .  0
    10000217  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000222  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000224  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000225  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000227  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000228  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000229  1  1  1  1  1 ""     . . 0 ""      .  .  .  .  0
    10000233  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000235  1  1  1  1  1 ""     . . 0 ""      .  .  .  .  0
    10000238  1  1  0  0  0 ""     . . 0 ""      .  .  .  .  0
    10000239  1  1 .e .e .e ""     . . 0 ""      .  .  .  .  0
    10000241  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000242  1  1  1  1  1 ""     . . 0 ""      .  .  .  .  0
    10000245  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000246  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000248  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000249  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000258  0  0  .  .  . ""     . . 0 ""      .  .  .  .  0
    10000261  1  1  1  1  1 ""     . . 0 ""      .  .  .  0  0
    end
    label values lfd misslab
    label values F06_01 F06_01
    label def F06_01 0 "0. Nein, kein Kontakt mit Polizei", modify
    label def F06_01 1 "1. Ja, Kontakt mit Polizei", modify
    label def F06_01 .e "keine Angabe", modify
    label values F06_01_e F06_01_e
    label def F06_01_e 0 "0. Nein, ich hatte in diesem Zeitraum keinen Kontakt mit der Polizei.", modify
    label def F06_01_e 1 "1. Ja, ich hatte in diesem Zeitraum Kontakt mit der Polizei.", modify
    label def F06_01_e .b "unklar", modify
    label def F06_01_e .e "keine Angabe", modify
    label values F06_02 F06_02
    label def F06_02 0 "0. Nein, anderes Bundesland", modify
    label def F06_02 1 "1. Ja, in meinem Bundesland", modify
    label def F06_02 .e "keine Angabe", modify
    label values F06_02_e F06_02_e
    label values F06_02_e2 F06_02_e
    label def F06_02_e 0 "0. Nein, in einem anderen Bundesland.", modify
    label def F06_02_e 1 "1. Ja, in meinem Bundesland.", modify
    label def F06_02_e .b "unklar", modify
    label def F06_02_e .e "keine Angabe", modify
    label values F24_A_1_e F24_A_1_e
    label values F24_A_1_e2 F24_A_1_e
    label values F24_A_1_e3 F24_A_1_e
    label values F28_A_1_e F28_A_1_e
    label values F28_A_1_e2 F28_A_1_e
    label values F28_A_1_e3 F28_A_1_e
    label values F28_A_1_e4 F28_A_1_e
    label values F28_A_1_e4_ofilt F28_A_1_e
    label def F28_A_1_e .b "unklar", modify
    
    drop F28_A_1_e4_ofilt
    rename *_e *_e1
    local vars
    foreach var of varlist *{
        local vars `vars'  `=ustrregexra("`var'", "(\w+\_\w+)(\_)(e$|e\d+$)", "$1")'
    }
    di "`vars'" 
    local vars: list uniq vars
    foreach var of local vars{
             forval i=1/4{
                 capture confirm variable `var'_e`i'
                 if !_rc {
                     cap drop `var'
                     cap drop `var'_e`=`i'-1'
                  }
              }
    }
    Res.:

    Code:
    . ds
    lfd         F06_01_e1   F06_02_e2   F24_A_1_e3  F28_A_1_e4
    
    .

    Comment


    • #3
      You are a magician, thanks for your fast reply
      In the case of the _ofilt-variable, I want to keep the e4-version as well as the e4_ofilt-version. But as the rename-command ignores this variable anyways, it is no problem with your code and in the end, I have exactly all the variables, I need.

      Comment

      Working...
      X