Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Identify variables by their order in the dataset

    Hi,

    I am working with a new feed of data every week (in real time). Therefore, as real time passes my time series dataset increase in size and variable names change due to new real dates added (currently AE, but next (real) week it will become AD etc...)... I would like to identify variables by their order in the dataset. I would to be able to rename by two different methods (large project, I need to do it several times and need the 2 methods)
    1) "the second variable in the dataset as currently ordered" and also
    2) "the variable next (on the right) of a particular variable", in the example below "DATATYPE"

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str8 DATATYPE double(AE AF AG AH AI AJ)
    "SP500_w" 6247.66 6375.54 6529.42 6389.67 6209.38 6075.76
    "VIX_w"     33.84   33.47   27.57   25.66   27.62   27.99
    end
    This is how I currently rename the variables. I would like to replace "AE" by "the variable next to DATATYPE" or "the second variable in the dataset"
    Code:
    * Rename variables as t1, t2, t3 etc.
    qui ds
        loc lastvar: word `c(k)' of `r(varlist)'
    local j 0
    foreach var of varlist AE-`lastvar' {
        local j `=`j'+1'
        rename `var' t`j'
    }
    Any idea how to solve this problem? Thanks.



  • #2
    Code:
    local j 1
    foreach var of varlist _all {
        if strlower("`var'") == "datatype" continue
        rename `var' t`j'
        local ++j
    }

    Comment


    • #3
      findname as updated in Stata Journal 20(2) (2020) includes a handle that may help. At the time of writing even Stata 16.1, updated, is not up to date to include this in the files searched by search.

      But you can go


      Code:
      net sj 20-2 dm0048_4
      to be able to see the files for download. Here is an example of
      findname in action.

      Code:
      . which findname
      <wherever it is on my system>\findname.ado
      *! 1.1.0 NJC 21 January 2020
      
      .. sysuse auto, clear
      (1978 Automobile Data)
      
      .
      . findname
      make          mpg           headroom      weight        turn          gear_ratio
      price         rep78         trunk         length        displacement  foreign
      
      . findname, columns(2)
      price
      
      . findname, columns(-1)
      foreign
      
      . findname, columns(5 7)
      headroom  weight





      For this to work, findname must be 1.1.0 (or higher).

      findname is, functionally, a superset of ds (an official command, but one I hacked at too). I changed the syntax a bit.
      Last edited by Nick Cox; 27 Jun 2020, 02:52.

      Comment


      • #4
        Thanks a lot to you Joseph and Nick!

        Nick, how do you integrate "findmane" in a piece of code? I have tried the following with no success:

        Code:
          
         * Rename variables as t1, t2, t3 etc. qui ds     loc firstvar findname, columns(4)     loc lastvar: word `c(k)' of `r(varlist)' local j 0 foreach var of varlist `firstvar'-`lastvar' {     local j `=`j'+1'     rename `var' t`j' }
        Code:
        variable findname not found
        Thank you in advance.

        Comment


        • #5
          #4 is mangled at the time of writing, but saved results are documented in the help, so that for example the single variable name identified by columns(4) (so long as there are 4 or more variables in the dataset) is the entire contents of r(varlist)

          Code:
          . sysuse auto, clear
          (1978 Automobile Data)
          
          . findname, columns(4)
          rep78
          
          . ret li
          
          macros:
                      r(varlist) : "rep78"


          You can also push the result into a local macro.
          Last edited by Nick Cox; 27 Jun 2020, 03:55.

          Comment


          • #6
            Joseph:


            Code:
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input str52 Name str12 Code str4 DATATYPE double(AE AF AG AH AI AJ)
            "AMAZON.COM - TOT RETURN IND"                  "891399(RI)"   "RI"   139628.9 134858.5 135188.9 126556.6 123083.7 127554.4
            "AMAZON.COM - TURNOVER BY VALUE"               "891399(VA)"   "VA"   54856420 56101860 51967390 36340370 40389570 49907070
            "AMAZON.COM - TURNOVER BY VOLUME"              "891399(VO)"   "VO"    20231.8  21664.5  20347.8  14861.4  16606.6  20568.3
            "AMAZON.COM - NUMBER OF SHARES"                "891399(NOSH)" "NOSH"   498776   498776   498776   498776   498776   498776
            "ABBOTT LABORATORIES - TOT RETURN IND"         "916328(RI)"   "RI"     104698 109015.4 110526.6 109615.1 108847.6   108044
            "ABBOTT LABORATORIES - TURNOVER BY VALUE"      "916328(VA)"   "VA"    2782464  2673960  3345406  3274088  2073081  4020672
            "ABBOTT LABORATORIES - TURNOVER BY VOLUME"     "916328(VO)"   "VO"    30846.2  29929.8  37101.9  35361.9  22997.6  44629.6
            "ABBOTT LABORATORIES - NUMBER OF SHARES"       "916328(NOSH)" "NOSH"  1768844  1768844  1768844  1768844  1768844  1768844
            "AES - TOT RETURN IND"                         "545101(RI)"   "RI"     569.98   552.27   542.38   561.33   525.91   534.15
            "AES - TURNOVER BY VALUE"                      "545101(VA)"   "VA"     561943 471604.3 523206.9 441729.2 339348.6 389947.6
            "AES - TURNOVER BY VOLUME"                     "545101(VO)"   "VO"    40897.8  36528.5  37241.3  34300.2  26966.6  31618.4
            "AES - NUMBER OF SHARES"                       "545101(NOSH)" "NOSH"   664908   664908   664908   664908   664908   664908
            "ABIOMED - TOT RETURN IND"                     "777953(RI)"   "RI"    4393.82  4631.45  4560.91     4328  3777.45  3570.18
            "ABIOMED - TURNOVER BY VALUE"                  "777953(VA)"   "VA"   606298.3 534542.1 667992.8 758397.6 455356.5 306943.1
            "ABIOMED - TURNOVER BY VOLUME"                 "777953(VO)"   "VO"     2401.5   2184.4   2703.4   3370.8   2217.8   1620.2
            "ABIOMED - NUMBER OF SHARES"                   "777953(NOSH)" "NOSH"    44957    44957    44957    44957    44957    44957
            "INTERNATIONAL BUS.MCHS. - TOT RETURN IND"     "906187(RI)"   "RI"    2096.87   2236.1  2339.13  2324.36  2261.14  2186.21
            "INTERNATIONAL BUS.MCHS. - TURNOVER BY VALUE"  "906187(VA)"   "VA"    3522348  3891923  3331583  2714737  2356168  2531540
            "INTERNATIONAL BUS.MCHS. - TURNOVER BY VOLUME" "906187(VO)"   "VO"    29114.8  31901.1  25256.7  21625.9  19378.4  21335.3
            "INTERNATIONAL BUS.MCHS. - NUMBER OF SHARES"   "906187(NOSH)" "NOSH"   887892   887892   887892   887892   887892   887892
            "ADVANCED MICRO DEVICES - TOT RETURN IND"      "936365(RI)"   "RI"   12171.36 12673.17 13344.59 12250.34 12252.67 13100.64
            "ADVANCED MICRO DEVICES - TURNOVER BY VALUE"   "936365(VA)"   "VA"   12960810 16447280 16312270 12974020 13117540 16416590
            "ADVANCED MICRO DEVICES - TURNOVER BY VOLUME"  "936365(VO)"   "VO"   239485.5 302741.4 297097.7 245272.9 246053.6 300720.7
            "ADVANCED MICRO DEVICES - NUMBER OF SHARES"    "936365(NOSH)" "NOSH"  1171190  1171190  1171190  1171190  1171190  1171190
            "ADOBE (NAS) - TOT RETURN IND"                 "749382(RI)"   "RI"   136128.4 130392.3   128289 122959.6 118308.3 120976.1
            "ADOBE (NAS) - TURNOVER BY VALUE"              "749382(VA)"   "VA"    6922195  8937880  5423548  6146828  4165504  4897871
            "ADOBE (NAS) - TURNOVER BY VOLUME"             "749382(VO)"   "VO"    16061.6  22119.2  13700.8  15923.8  11028.8  13309.8
            "ADOBE (NAS) - NUMBER OF SHARES"               "749382(NOSH)" "NOSH"   479665   481801   481801   481801   481801   481801
            "ALEXANDRIA RLST.EQTIES. - TOT RETURN IND"     "891631(RI)"   "RI"    1843.22  1927.21  1780.53  1744.45  1707.58  1648.61
            "ALEXANDRIA RLST.EQTIES. - TURNOVER BY VALUE"  "891631(VA)"   "VA"    1902550 987086.2 685670.7 835837.2 370096.6 772526.6
            "ALEXANDRIA RLST.EQTIES. - TURNOVER BY VOLUME" "891631(VO)"   "VO"    11350.4   6053.5   4352.4   5421.2   2488.7   5413.3
            "ALEXANDRIA RLST.EQTIES. - NUMBER OF SHARES"   "891631(NOSH)" "NOSH"   126195   126195   126195   126195   126195   126195
            "AIR PRDS.& CHEMS. - TOT RETURN IND"           "905271(RI)"   "RI"   34026.52 34655.56 36174.89 36262.46 34321.34 34998.54
            "AIR PRDS.& CHEMS. - TURNOVER BY VALUE"        "905271(VA)"   "VA"    1153356  1044249  1236181  1136230   846314  1175705
            "AIR PRDS.& CHEMS. - TURNOVER BY VOLUME"       "905271(VO)"   "VO"     4870.9   4412.3   4983.1     4668   3580.2   5061.5
            "AIR PRDS.& CHEMS. - NUMBER OF SHARES"         "905271(NOSH)" "NOSH"   220855   220855   220855   220855   220855   220855
            "ALASKA AIR GROUP - TOT RETURN IND"            "921794(RI)"   "RI"    3425.12  3725.91  4131.26  3914.14  3559.58  3013.79
            "ALASKA AIR GROUP - TURNOVER BY VALUE"         "921794(VA)"   "VA"   597704.7 979011.5  2039566 642940.1 551998.7 456727.3
            "ALASKA AIR GROUP - TURNOVER BY VOLUME"        "921794(VO)"   "VO"      16484  26027.8  45395.6  17720.3    16276  16494.9
            "ALASKA AIR GROUP - NUMBER OF SHARES"          "921794(NOSH)" "NOSH"   122585   122585   122585   122585   122585   122585
            "BOSTON PROPERTIES - TOT RETURN IND"           "894371(RI)"   "RI"     950.85  1032.38  1061.74  1012.87   941.52   856.92
            "BOSTON PROPERTIES - TURNOVER BY VALUE"        "894371(VA)"   "VA"    1030043   737735  1043032 851815.9 738068.8 993449.3
            "BOSTON PROPERTIES - TURNOVER BY VOLUME"       "894371(VO)"   "VO"    11140.6   7692.8  10158.1   9465.4     8909  12821.7
            "BOSTON PROPERTIES - NUMBER OF SHARES"         "894371(NOSH)" "NOSH"   155369   155369   155369   155369   155369   155369
            "ALLSTATE ORD SHS - TOT RETURN IND"            "322677(RI)"   "RI"    1175.96  1239.22   1290.2  1270.34  1229.68  1220.37
            "ALLSTATE ORD SHS - TURNOVER BY VALUE"         "322677(VA)"   "VA"    1052804 965266.3  1061730 879218.3 948323.2 896254.3
            "ALLSTATE ORD SHS - TURNOVER BY VOLUME"        "322677(VO)"   "VO"    10851.5   9960.4  10202.5   8905.6     9951   9309.1
            "ALLSTATE ORD SHS - NUMBER OF SHARES"          "322677(NOSH)" "NOSH"   314116   314116   314116   314116   314116   314116
            "HONEYWELL INTL. - TOT RETURN IND"             "906191(RI)"   "RI"    3762.66  4034.35  4191.55  4170.61  4041.15  3699.56
            "HONEYWELL INTL. - TURNOVER BY VALUE"          "906191(VA)"   "VA"    2326534  2701761  2991404  2418696  2007313  3087099
            "HONEYWELL INTL. - TURNOVER BY VOLUME"         "906191(VO)"   "VO"      16055  18423.7    18856  16307.5  13994.8  23670.5
            "HONEYWELL INTL. - NUMBER OF SHARES"           "906191(NOSH)" "NOSH"   701848   701848   701848   701848   701848   701848
            "AMGEN - TOT RETURN IND"                       "938692(RI)"   "RI"   95208.19 92530.19 93001.06 90495.06 92055.19 93308.19
            "AMGEN - TURNOVER BY VALUE"                    "938692(VA)"   "VA"    3345138  3144787  3061968  3483130  2250313  4583161
            "AMGEN - TURNOVER BY VOLUME"                   "938692(VO)"   "VO"    14271.1  14132.7  13633.2  15519.4  10041.9  19616.3
            "AMGEN - NUMBER OF SHARES"                     "938692(NOSH)" "NOSH"   588247   588247   588247   588247   588247   588247
            "HESS - TOT RETURN IND"                        "905802(RI)"   "RI"    1763.95  1816.19  1920.36  1868.02  1770.52  1697.03
            "HESS - TURNOVER BY VALUE"                     "905802(VA)"   "VA"   676036.9 652340.5 991937.1 631035.7 316123.5 552119.9
            "HESS - TURNOVER BY VOLUME"                    "905802(VO)"   "VO"    13104.7  13058.4  17461.6  12712.6   6590.6  12010.9
            "HESS - NUMBER OF SHARES"                      "905802(NOSH)" "NOSH"   307144   307144   307144   307144   307144   307144
            "AMERICAN EXPRESS - TOT RETURN IND"            "906156(RI)"   "RI"    6808.18  7412.85  7569.72  7515.53  7218.18  6469.48
            "AMERICAN EXPRESS - TURNOVER BY VALUE"         "906156(VA)"   "VA"    2627571  3419545  3733819  3372175  2610872  2992165
            "AMERICAN EXPRESS - TURNOVER BY VOLUME"        "906156(VO)"   "VO"    26358.1  33450.1  33999.3  33991.3  27543.4  35170.3
            "AMERICAN EXPRESS - NUMBER OF SHARES"          "906156(NOSH)" "NOSH"   804972   804972   804972   804972   804972   804972
            "AMER.ELEC.PWR. - TOT RETURN IND"              "905425(RI)"   "RI"    7386.85   7527.8  7813.35  7961.62  7453.66  7115.94
            "AMER.ELEC.PWR. - TURNOVER BY VALUE"           "905425(VA)"   "VA"    1380356  1212505 961994.5  1064726   857369  1579471
            "AMER.ELEC.PWR. - TURNOVER BY VOLUME"          "905425(VO)"   "VO"    17109.1  14809.8  11204.8  12449.4  10795.1  20196.8
            "AMER.ELEC.PWR. - NUMBER OF SHARES"            "905425(NOSH)" "NOSH"   495583   495583   495583   495583   495583   495583
            "AFLAC - TOT RETURN IND"                       "933185(RI)"   "RI"   185204.1   193735 205846.8 201212.8 195736.1 183255.8
            "AFLAC - TURNOVER BY VALUE"                    "933185(VA)"   "VA"   774257.8 691941.8 792452.7  1214250 560151.8 901689.4
            "AFLAC - TURNOVER BY VOLUME"                   "933185(VO)"   "VO"    21234.2  18800.7  19691.6  33003.6  15638.3  27018.2
            "AFLAC - NUMBER OF SHARES"                     "933185(NOSH)" "NOSH"   717508   717508   717508   717508   717508   717508
            "AMERICAN INTL.GP. - TOT RETURN IND"           "916305(RI)"   "RI"     207.06   220.59   237.97   225.04   213.26   189.78
            "AMERICAN INTL.GP. - TURNOVER BY VALUE"        "916305(VA)"   "VA"   796163.3  1211121  1657832 994367.1 789241.1 940727.5
            "AMERICAN INTL.GP. - TURNOVER BY VOLUME"       "916305(VO)"   "VO"    25080.8  37025.6  45105.3  31785.1  25950.8  34655.5
            "AMERICAN INTL.GP. - NUMBER OF SHARES"         "916305(NOSH)" "NOSH"   861291   861291   861291   861291   861291   861291
            "ANALOG DEVICES - TOT RETURN IND"              "905276(RI)"   "RI"   146690.9 150448.1 150877.9 149330.8 140027.1 139892.8
            "ANALOG DEVICES - TURNOVER BY VALUE"           "905276(VA)"   "VA"    1526467  1460516  1408691  1535515 980639.3  1846840
            "ANALOG DEVICES - TURNOVER BY VOLUME"          "905276(VO)"   "VO"    12575.8  12230.1  11377.1  13272.2   8724.5  17171.9
            "ANALOG DEVICES - NUMBER OF SHARES"            "905276(NOSH)" "NOSH"   368425   368425   368425   368425   368425   368425
            "ALEXION PHARMS. - TOT RETURN IND"             "873087(RI)"   "RI"    4907.39  5063.88  5099.39  4987.17  4652.71  4554.95
            "ALEXION PHARMS. - TURNOVER BY VALUE"          "873087(VA)"   "VA"    1198339  1194140  1089816  1797001 680184.6 887974.4
            "ALEXION PHARMS. - TURNOVER BY VOLUME"         "873087(VO)"   "VO"      10280  10706.5   9341.9  15454.2   6607.3   8687.7
            "ALEXION PHARMS. - NUMBER OF SHARES"           "873087(NOSH)" "NOSH"   220827   220827   220827   220827   220827   220827
            "VALERO ENERGY - TOT RETURN IND"               "982852(RI)"   "RI"   10025.25 10768.24 11931.12 12188.57 12126.37 11635.64
            "VALERO ENERGY - TURNOVER BY VALUE"            "982852(VA)"   "VA"    1551822  1487855  1667255  1575800  1065722  1470565
            "VALERO ENERGY - TURNOVER BY VOLUME"           "982852(VO)"   "VO"    24812.2  23460.6  22963.4  23199.9  15674.4  23339.3
            "VALERO ENERGY - NUMBER OF SHARES"             "982852(NOSH)" "NOSH"   407699   407699   407699   407699   407699   407699
            "APACHE - TOT RETURN IND"                      "921983(RI)"   "RI"    2206.29  2344.72  2503.39  2115.14  2123.58  2017.23
            "APACHE - TURNOVER BY VALUE"                   "921983(VA)"   "VA"   872918.6  1052121  1764325  1359383 748640.1  1026554
            "APACHE - TURNOVER BY VOLUME"                  "921983(VO)"   "VO"    62102.4  75774.3 112107.9 117755.8  60925.3    90989
            "APACHE - NUMBER OF SHARES"                    "921983(NOSH)" "NOSH"   377426   377426   377426   377426   377426   377426
            "COMCAST A - TOT RETURN IND"                   "981550(RI)"   "RI"   212688.2 216328.6 228132.4 227801.4 220244.8 216052.8
            "COMCAST A - TURNOVER BY VALUE"                "981550(VA)"   "VA"    4647787  3883378  3375666  3538267  2968304  4883485
            "COMCAST A - TURNOVER BY VOLUME"               "981550(VO)"   "VO"     119027  98050.3  80288.9  88109.9  75367.3 131622.8
            "COMCAST A - NUMBER OF SHARES"                 "981550(NOSH)" "NOSH"  4554711  4554711  4554711  4554711  4554711  4554711
            "ANSYS - TOT RETURN IND"                       "874830(RI)"   "RI"    9608.41  9621.55  9633.68  9663.99  9071.49  9169.17
            "ANSYS - TURNOVER BY VALUE"                    "874830(VA)"   "VA"   895457.9 698843.1 684167.5 789653.8 468863.2 582669.1
            "ANSYS - TURNOVER BY VOLUME"                   "874830(VO)"   "VO"     3119.2   2514.3   2381.7   2791.4   1747.7   2240.6
            "ANSYS - NUMBER OF SHARES"                     "874830(NOSH)" "NOSH"    85595    85595    85595    85595    85595    85595
            end

            How can I modify your code so that only the variables on the right of datatype get affected? With the following code, "Name" and "Code" are set as missing but should not. Any idea?
            Code:
            foreach var of varlist _all {
                if strlower("`var'") == "datatype" continue
                capture noisily replace `var'="" if `var'=="NA"
                capture noisily replace `var'="" if `var'==" "
                capture noisily destring `var', replace force
            }

            Comment


            • #7
              So, is there any way to identify variables as "the variables on the right of DATATYPE"? Thanks a lot in advance.

              Comment


              • #8
                Your example is highly volatile. For the data in #1, an answer is just

                Code:
                findname DATATYPE, not
                If you know for certain that as in #6 your variables start Name Code DATATYPE then by the same reasoning a solution is

                Code:
                findname Name Code DATATYPE, not
                If all you are telling us is that there are some variables, then DATATYPE, then some other variables, then

                Code:
                findname, col(1) 
                findname. `r(varlist)'-DATATYPE, not
                will get you there in two moves. I would welcome one-line solutions.

                Comment


                • #9
                  That is great, thank you very much, I actually need to use both solutions you suggested.

                  So these solutions give me a list of variables, which is what I wanted. Also, is there a way to select only the 1st variable after a list of variables (Name Code DATATYPE) or a single variable DATATYPE? Thanks, again.

                  Comment


                  • #10
                    Originally posted by Francois Durant View Post
                    How can I modify your code so that only the variables on the right of datatype get affected?
                    You'd set a flag to skip until you found DATATYPE, and then set the flag not to skip. Something akin to the following (untested).
                    Code:
                    local skip 1
                    foreach var of varlist _all {
                        if strlower("`var'") == "datatype" {
                            local skip 0
                            continue
                        }
                        else if `skip' continue
                        else {
                            local data_type : type `var'
                            if substr("`data_type'", 1, 1) == "s" {
                                replace `var' = "" if inlist(`var', " ", "NA")
                            }
                            else {
                                // Do whatever it is you want to do with numeric variables
                            }
                        }
                    }

                    Comment


                    • #11
                      Thanks a lot Joseph, this is helpful!

                      Is there a way to achieve the same outcome with "findname" by curiosity?

                      Comment


                      • #12
                        I don't know. I have not used findname.

                        Comment


                        • #13
                          First note please a typo in #8, a spurious stop or period removed below:

                          Code:
                           findname `r(varlist)'-DATATYPE, not
                          is there a way to select only the 1st variable after a list of variables (Name Code DATATYPE)
                          Code:
                          findname Name Code DATATYPE, not local(all)
                          local wanted : word 1 of `all'

                          or a single variable DATATYPE?
                          That's just a variation on the previous problem, is it not? Alternatively, if what you are after is the second variable, then

                          Code:
                          unab all : *
                          local wanted : word 2 of `all'


                          .

                          Comment


                          • #14
                            Thank you so much Nick, this is very helpful!

                            Comment

                            Working...
                            X