Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • loop for putting together multiple dta files with the same variable names

    Dear Stata community,

    I have multiple excel files with data on export. Due to your kind help I managed to save them in dta. They have wide format. I wanted to put them together. Here is a data example of the first file:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str4 period double(ExportsfromAustraliatoAustri C D) str17 E
    "1967" 1.68818291542382  .447193126169174   .254802440348146 "NA"               
    "1968" 1.37783875343941  .255840493746482  .0766597683580703 "NA"               
    "1969" 2.14021372461828  .328313422978159     .1290259660822 ".0011185009753137"
    "1970" 1.37254388532754  .192041412896288   .103648950063559 ".0014473268885849"
    "1971" 1.07272091518585  .376749961389412   .150304925135887 ".0043821532603898"
    "1972" 1.03049032659761  .170154750132609  .0467655443634626 ".0040929481760718"
    "1973" 1.10660073292803  .554202972554974   .042445121273776 ".0147793837375182"
    "1974" .619402162186426  .110295313069868 .00421017570778895 ".0033361329234095"
    "1975" .862150009789141  .457623355172952   .173436807927162 ".005716531769577" 
    "1976" 1.17399480567722   .18227499023391  .0899267099507063 ".0074502711364507"
    "1977"  1.0596683294864  .252186446517037  .0971412808940711 ".0175952383069712"
    "1978" 2.74407206961562  .326298100332131   .226576840094671 ".0082933268536468"
    "1979" 2.43123089502343  .356706683016324   .236905414894377 ".0301890470330195"
    "1980" 3.73198211321387  .446112989371622   .141884411694923 ".023741354288482" 
    "1981" 2.86560192580854   .11650479750537 .00505486422107316 ".0256636617808451"
    "1982" 2.46898937574441 .0681212756305846  .0409741067841663 ".0229692068236075"
    "1983" 1.85716024628491  .171121272324057   .130544004006761 ".0077711427433588"
    "1984" 1.89950961141022   .55082007656801   .395247194341352 "NA"               
    "1985" 6.74099973156501  4.51314312672418   4.32899960820237 ".0331840324351902"
    "1986"  2.6416032194799  .922339003082142   .765309045323289 ".0230991871640027"
    "1987" 3.50108724040966  1.00879296806758   .666709174063678 ".0338925172702406"
    "1988" 5.89184415767255  4.87417369147086   4.32603952670905 ".0309162835204018"
    "1989" 2.10293175939197  1.11200655142119   .945336338903254 ".0364085343299939"
    "1990" 3.09566901071224   .57895066182819   .294643535250431 ".0593818650269951"
    "1991"  4.5952602023237  2.25751780793534   2.11941353857843 ".0506122384234352"
    "1992"  5.2282254216153  3.62389907526694    3.2407176217882 ".156218000188867" 
    "1993" 3.60528197241182  2.27059248575925   2.15936564523635 ".0440518674912489"
    "1994" 4.20129878529848  2.61912210318222   2.48602899558369 ".0897388701919393"
    "1995" 1.66267308728776  .718284669540413   .601226761357698 ".0763861976604632"
    "1996" 1.67174006197263  .782547191550645   .713766766181845 ".0226147164761684"
    "1997" 1.46324874669685  1.14965136733421   1.02237855424437 ".0162570073507562"
    "1998" 1.49118616589432  1.33347393623139   1.17672043293244 ".126269767485836" 
    "1999" .797078970751392   .71046188236338   .620047433227721 ".0772284463703214"
    "2000" .685350935246251  .614281599995358   .548381828240676 ".0590893523680979"
    "2001" .790855012105365  .575420760492461   .517316716681913 ".0434616679209898"
    "2002" .678040814956461  .572556585585948   .481116637902978 ".0533440622037423"
    "2003" .962200827340452  .648574348691287   .387374677722107 ".174542912918566" 
    "2004" .539462037877982   .46406333019581   .289660600577625 ".0960438231260797"
    "2005" .829157935583996  .501237842646939  .0739293580848232 ".198353245480259" 
    "2006" .892993578681696  .297647331955208  .0716496026031399 ".154609313711333" 
    "2007"  1.1831879757206  .330713317689642  .0790955954086102 ".188937995410851" 
    "2008" .722686752070487  .254740682283988  .0355505518170268 ".0736625387465388"
    "2009" 1.25267905943902  .785066710532028  .0507452481777911 ".529209776576246" 
    "2010" 1.33814239274765  .687448088730707   .101268940320203 ".211859216032929" 
    "2011" 1.89234563458849  .981591215438803   .263935645697609 ".38498013940013"  
    "2012"  1.4661200700721  .764122338527107   .194755663981232 ".135434932164803" 
    "2013" 1.73105339685472  1.04842663639813   .330930068599616 ".0609117081286965"
    "2014" 1.25397735481203  .851379333036416   .147158715256268 ".108021829853592" 
    "2015" 1.75072124689188  1.39002801113872   .454976523638499 ".0571724831827264"
    "2016" 1.00536622930604  .584672332418586   .149590925095184 ".195573416796196" 
    end



    Here is my last file:


    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str4 period double(ExportsfromUnitedStatestoIt C D E)
    "1967" 174.492405064972 166.571163412104  163.87680753628 1.86818219345245
    "1968" 212.772665655041 205.116920835749 203.143651495025 1.41942115705834
    "1969" 187.473415535854   177.8468516799 175.519416323775 1.58020218845571
    "1970" 159.501689171693 152.728422520563 149.485662994443 2.22806510781669
    "1971" 219.316793866383 212.531283395739 208.069644029362 3.10099329098794
    "1972" 270.925197498744 261.184807999075 255.005775340155 3.19265563767313
    "1973" 571.429390196771 542.680836992341 537.037898868968 3.58743498164142
    "1974" 602.540700944342 576.921527085956 571.143142451317 3.67182723371965
    "1975" 739.625623576642 720.479855816135 712.848206521062 3.38096404656251
    "1976" 698.598333363425 664.698702023926 654.810462512371  4.3414671910904
    "1977"  626.38505077781  590.68517404259 578.508944325358 6.59111086498766
    "1978" 707.119356541337 671.021847169435 660.339622593115 6.39083818051895
    "1979" 760.917389007797 712.589239063797 692.958295555045 8.46779806113076
    "1980" 895.317486239834 846.317138260775 814.000907854755 12.3003462647501
    "1981" 883.322422115479 833.543738107952 814.786509076024 8.90209720803405
    "1982" 759.365610232439 714.090520142637 695.033591163977 8.43422504129108
    "1983" 533.659510591735 500.947295528265 483.394049428199  9.9052783518371
    "1984"    584.432790076 550.831166601927 534.510056497722 10.4122826901857
    "1985" 495.466484340692 465.573294246828 444.657679171418 9.60266817579404
    "1986" 395.949124094612 370.735126413806 350.832720685255 10.0954907233989
    "1987" 408.225920795585 376.992301020841 341.406183969205 14.6507595486822
    "1988" 400.573552249945 363.991903597674 324.008880140329 12.8519913744082
    "1989" 419.879984081588  385.82343827845 357.327039990317 9.66872036410687
    "1990" 563.863921940972 528.636657540521 485.570389834502 14.5067084884993
    "1991" 546.737510829842 513.981972839345 464.405146467759 13.2052742268214
    "1992" 482.978136748232 457.854628661239 414.090281479538 13.2596704570517
    "1993" 437.025421663089  418.59983891854 388.929859931922 9.71452364706735
    "1994" 385.604652660188  365.67949377423 316.804193403725 7.16653900104857
    "1995" 506.361482618778 486.392741085139 438.858799933078 9.91570730406677
    "1996" 534.164608180819 511.960835973455 446.881790915665 10.3896625121043
    "1997" 413.522231352292 396.807832148585 351.531171653619 8.27058550983761
    "1998"  393.62908124399 376.968391322962 323.987079340289  9.4633523134496
    "1999" 302.960938200106  291.03770151843 242.305157729034 10.6523128696524
    "2000" 337.140853598744 326.100067225963 275.261443467374 14.3795402551957
    "2001" 346.137305082496 331.610729602144 279.270892433107 8.03299700712313
    "2002"  373.53086603046   360.9320334875 301.324677101759 9.76703954871962
    "2003" 403.608041866262 390.131020594363 300.884640821627 10.5495353952695
    "2004" 392.348683165626 376.013171100633 260.509841016342  10.115136571513
    "2005" 387.824566283309 367.908243452877 205.361747358679 11.3043293261853
    "2006" 357.017444481361 334.653420391861 166.731722546372 7.87347747633133
    "2007" 454.614500234626  425.74007272965  289.93387302984 11.1063872689224
    "2008" 506.021758199696 476.629636930597 346.473228038767  10.129855797694
    "2009"  379.01894106702 357.428543656066 221.380838624052 11.0336506919995
    "2010" 422.868504717562 407.076157643286 260.276000878856 7.54702080441411
    "2011"  532.33687362881 516.141874286354 335.225999280615 8.73508747063099
    "2012" 442.183450698473 416.078377410966 232.345184139121 8.71108087466194
    "2013" 598.655032730868 569.010848500849 340.040617334909 5.67700812979672
    "2014" 704.419875800654 665.569458856273  364.45370647464 7.47298840585096
    "2015" 721.911024339495 684.872741936037 410.692098502934 7.00682387076601
    "2016" 677.263848159713 637.861622305507 348.638037618437 5.30732604741538
    end

    I put my first dta file to a different folder, opened it and tried to append the rest of the files. I used a code:

    Code:
    global dir "U:\CEPII CHELEM\CHELEM ISIC\All"
    clear all
       
      use "U:\CEPII CHELEM\CHELEM ISIC\AUS_AUT BEL CAN.xlsx.dta"
       
    local files : dir "$dir" files "*.dta", respectcase
      foreach j in `files' {
                   append using "$dir/`j'"
          }
    I typed:
    Code:
     set trace on
    to spot the mistake

    I saw: variable E is str17 in master but double in using data
    You could specify append's force option to ignore this string/numeric mismatch. The using variable
    would then be treated as if it contained ""


    The first file contains export data from Australia to Austria, Belgium and Canada. The second file contains data on export from Asutralia to Switzerland, China and Czech Republic. In the column E in the first file there is: Exports from Australia to Austria - Category - 0112 - Vegetables, horticultural. In the column E in the second file there is: Exports from Australia to Switzerland - Category - 0112 - Vegetables, horticultu

    My plan was to save the excel files as dta, append/merge them, change names and reshape.

  • #2
    For each of the files, you should run the code:

    Code:
    destring Exportsfrom* C D E, replace ignore("NA")
    to first convert those variables to numeric, before you append the files.

    Further, I am not clear what is the form you want your final data in. With these variables, if you append, you will end up with multiple columns of ExportfromXtoY, since each file has a different pair of countries. Do you want these data in "wide" form, with one row for each year, and data for each pair of countries as different columns? Or do you want them in long form, with a single variable for Exports, and the country pairs specified as variables? Either way, this will take a little bit of code to fix.
    Last edited by Hemanshu Kumar; 21 Oct 2022, 05:29.

    Comment


    • #3
      Thank you Hemanshu for your remark. Since in every file I have different columns with "NA" I used a code
      Code:
      destring, replace ignore("NA")
      when saving excel files as dta. As a result it appended the files but not in the way I wished. The problem is that in every file I have variables: period, Exportsfrom*-ADJ. E.g. in the first file I have ExportsfromAustraliatoAustria in the last file I have ExportsfromUnitedStatestoTur. When appending it appends only the first variables from the files. I tried to change their names in the loop saving excel files to dta. They are in the wide format. I wrote a code:



      Code:
      global dir "U:\CEPII CHELEM\CHELEM ISIC\All"
      
      local files: dir "$dir" files "*.xlsx", respectcase
        foreach file in `files' {
                     clear
                     import excel using "$dir/`file'", firstrow
                     destring, replace ignore("NA")
      foreach x of varlist Exportsfrom*-ADJ{
          local c:variable label `x'
          local word1: word 3 of `c'
          local word2: word 5 of `c'
          local word3: word 9 of `c'
          local myname `word1'`word2'`word3'
          rename `x' `=strtoname("`myname'")'
      }
                     save "$dir/`file'.dta", replace
         }
      I'm lucky that all variables are numbers. I have only data on periods and export. Therefore I didn't need to specify which variables Stata should destring. Unfortunately Stata doesn't take the specified columns and I have an error: - invalid name

      I don't know how to tell Stata to rename these variables for each file. Stata doesn't know what I mean with Exportsfrom*. I wanted Stata to take every column with this name since in every file the second variable starts from Exportsfrom. I think when I have different names I could append and have my panel data.

      Comment


      • #4
        Try this:

        Code:
        global dir "U:\CEPII CHELEM\CHELEM ISIC\All"
        
        local files: dir "$dir" files "*.xlsx", respectcase
        foreach file in `files' {
            clear
            import excel using "$dir/`file'", firstrow
            destring, replace ignore("NA")
            unab exp: Exportsfrom*
            foreach x of varlist `exp'-ADJ {
                local c:variable label `x'
                local word1: word 3 of `c'
                local word2: word 5 of `c'
                local word3: word 9 of `c'
                local myname `word1'`word2'`word3'
                rename `x' `=strtoname("`myname'")'
            }
            save "$dir/`file'.dta", replace
           }

        Comment


        • #5
          Thank you Hemanshu for your answer. I think it looks like a solution or at least one step towards a solution. However, I got an error: variable AustraliaCzech_ already defined. It turned out I was too optimistic how the labels have been constructed. E.g. I had in the label: Exports from Australia to Switzerland - Category - 01 - Agriculture, hunting and. I wanted to have in the variable name importer,exporter and number of category it means AsutraliaSwitzerland01. Therefore I took third, fifth and ninth word from label. E.g. for the Czech it looks differently like Exports from Australia to Czech Republic - Category - 930 so the ninth word is just dashed line _. I decided to modify my loop by command capture so that the loop won't break:

          Code:
          global dir "U:\CEPII CHELEM\CHELEM ISIC\All"
          
          local files: dir "$dir" files "*.xlsx", respectcase
            foreach file in `files' {
                         clear
                         import excel using "$dir/`file'", firstrow
                         destring, replace ignore("NA")
                         unab exp: Exportsfrom*
                         foreach x of varlist `exp'-ADJ{
                             capture{
              local c:variable label `x'
              local word1: word 3 of `c'
              local word2: word 5 of `c'
              local word3: word 9 of `c'
              local myname `word1'`word2'`word3'
              rename `x' `=strtoname("`myname'")'
                          }
          }
                         save "$dir/`file'.dta", replace
             }
          However, I encountered two problems. The first problem: I saw an error:variable ADJ not found. It turned out that for some files I have variables only till JD. I have no idea how to deal with it. Is it possible to tell Stata to rename all variables from Exportsfrom* till the last existing variable?

          The second problem: I don't know why but when it comes to renaming after label Stata doesn't rename variables in the middle and actually it should do it. For example I have: Exports from Australia to Austria - Category - 01, Stata renamed it as I wanted: AustraliaAustria01. The second variable label is Exports from Australia to Austria - Category - 011 - Crops; gardening; horticult. Stata left it as variable C. It left variables till column TG. And from export from Australia to Canada it renamed all variables. Actually it's very bizzare because here exporter, importer and category number are on third, fifth and ninth place. Then at the end for Canada it renamed all variables. I tried a code with local exp for an excel file with export from Australia to Austria, Belgium and Canada. It leaves the middle of variables as well. When I run a code with foreach x of varlist ExportsfromAustraliatoAustri-ADJ it renamed all of variables.

          Please help me. It is too advanced for my Stata knowledge. I really got stuck with it and looking for a solution since yesterday.

          Comment


          • #6
            For the naming problem, try something like this?

            Code:
            local c:variable label `x'
            local dummy = ustrregexm(`"`c'"',"from ([A-Za-z ]+) to ([A-Za-z ]+) - Category - ([0-9]+)")
            local from = ustrregexs(1)
            local to = ustrregexs(2)
            local cat =  ustrregexs(3)
            local myname = strtoname("`from'`to'`cat'")
            with this, if you have "Exports from Czech Republic to Austria - Category - 01", you will get myname as "Czech_RepublicAustria01".
            Last edited by Hemanshu Kumar; 22 Oct 2022, 07:05.

            Comment


            • #7
              For the problem with the changing last variable, consider this bit of code:

              Code:
              unab exp: Exportsfrom*
              unab allvars: *
              local lastvar_pos: list sizeof allvars
              local lastvar: word `lastvar_pos' of `allvars'
              
              foreach x of varlist `exp' - `lastvar' {
                  ...

              Comment


              • #8
                Thank you Hemanshu for your kind help. This time I didn't get any error. However, I still don't have changed names for all variables. I had 234 Excel files with data on export. I took the first dta file for export from Australia to Austria,Belgium and Canada. In column A in Excel file there is period. From column B there is data on export from Australia to partner countries. It changed column B to AustraliaAustria01. From C till TG it didn't take names after their labels. I have C-TG as names. And then I have changed names for export with Canada till AustraliaCanada9302. Then I have names ADI and ADJ with labels: Exports from Australia to Canada - Category - NV - Not specified, million US dol and Exports from Australia to Canada - Category - TT - Total, million US dollars (CE. I understand that Stata didn't take the part for TT and NV because there are letters. In macro the numbers are defined. Actually I don't think I need TT and NV for my analysis or if there is a way I could recode TT as 9999 and NV as 0. The problem is why a loop leaves these variables from C till TG unchanged.Here is the sample of generated dta file for export from Australia to Austria,Belgium and Canada:

                Code:
                * Example generated by -dataex-. For more info, type help dataex
                clear
                input int period double(AustraliaAustria01 C D)
                1967 1.68818291542382 .447193126169174  .254802440348146
                1968 1.37783875343941 .255840493746482 .0766597683580703
                1969 2.14021372461828 .328313422978159    .1290259660822
                1970 1.37254388532754 .192041412896288  .103648950063559
                1971 1.07272091518585 .376749961389412  .150304925135887
                end

                I wasn't able to take all variables because of the limit. It didn't change variables for Belgium but it changed almost all variables for Canada except from those which labels contained NV TT. When I'm taking an excel file and running a code for only one file:

                Code:
                import excel "U:\CEPII CHELEM\CHELEM ISIC\AUS\AUT BEL CAN.xlsx", sheet("annual") firstrow
                destring period-ADJ ,replace ignore("NA")
                
                foreach x of varlist ExportsfromAustraliatoAustri-ADJ{
                local c:variable label `x'
                local dummy = ustrregexm(`"`c'"',"from ([A-Za-z ]+) to ([A-Za-z ]+) - Category - ([0-9]+)")
                local from = ustrregexs(1)
                local to = ustrregexs(2)
                local cat =  ustrregexs(3)
                local myname = strtoname("`from'`to'`cat'")
                rename `x' `=strtoname("`myname'")'
                }
                I got a syntax error. I typed set trace on to see where the loop breaks. It broke when renaming JC because a label by JC is Exports from Australia to Austria - Category - NV - Not specified, million US do.

                What comes to my mind is to change label NV to 0 and TT to 9999 or just to get rid of variables which labels contain NV or TT. However, I'm still searching for a solution how to do it.

                Comment


                • #9
                  We can easily allow for categories NV and TT in the regular expression:

                  Code:
                  local dummy = ustrregexm(`"`c'"',"from ([A-Za-z ]+) to ([A-Za-z ]+) - Category - ([0-9]+|NV|TT)")
                  Last edited by Hemanshu Kumar; 22 Oct 2022, 21:11.

                  Comment


                  • #10
                    Thank you Hemanshu for helping me. Unortunately it is still the case that Stata didn't change variables in the middle. I don't know the reason for it. For example for export from Australia to Austria, Belgium and Canada in variabble B I had a label Exports from Australia to Austria - Category - 01 - Agriculture, hunting and rel. It changed variable B to AustraliaAustria01. In variable C I have in label Exports from Australia to Austria - Category - 011 - Crops; gardening; horticult. It didn't change the name till JD. Then I have only one variable for Belgium: AustraliaBelgium01. From JF till TG I don't have names.Then I have all variables changed for Canada. With categories NV and TT it worked. In the end I have AustraliaCanadaTT. It is bizzare to me that the middle is unchanged only with two exceptions (AustraliaAustria01 and AustraliaBelgium01).

                    I took again one of the excel files and run a code:

                    Code:
                     clear all
                     import excel "U:\CEPII CHELEM\CHELEM ISIC\All\AUS_AUT BEL CAN.xlsx", sheet("annual") firstrow
                    
                     log using test,text
                    set trace on
                    destring, replace ignore("NA")
                    unab exp: Exportsfrom*
                    unab allvars: *
                    local lastvar_pos: list sizeof allvars
                    local lastvar: word `lastvar_pos' of `allvars'           
                    foreach x of varlist `exp'-`lastvar'{
                    local c:variable label `x'
                    local dummy = ustrregexm(`"`c'"',"from ([A-Za-z ]+) to ([A-Za-z ]+) - Category - ([0-9]+|NV|TT)")
                    local from = ustrregexs(1)
                    local to = ustrregexs(2)
                    local cat =  ustrregexs(3)
                    local myname = strtoname("`from'`to'`cat'")
                        rename `x' `=strtoname("`myname'")'            
                    }
                    It didn't change variables in the middle as well. I looked in log file to find a hint why it is the case. I found for the first variable to rename in the log file:

                    foreach x of varlist `exp'-`lastvar'{
                    2. local c:variable label `x'
                    3. local dummy = ustrregexm(`"`c'"',"from ([A-Za-z ]+) to ([A-Za-z ]+) - Category -
                    > ([0-9]+|NV|TT)")
                    4. local from = ustrregexs(1)
                    5. local to = ustrregexs(2)
                    6. local cat = ustrregexs(3)
                    7. local myname = strtoname("`from'`to'`cat'")
                    8. rename `x' `=strtoname("`myname'")'
                    9. }
                    - foreach x of varlist `exp'-`lastvar'{
                    = foreach x of varlist ExportsfromAustraliatoAustri ExportsfromAustraliatoBelgiu Expo
                    > rtsfromAustraliatoCanada-ADJ{
                    - local c:variable label `x'
                    = local c:variable label ExportsfromAustraliatoAustri
                    - local dummy = ustrregexm(`"`c'"',"from ([A-Za-z ]+) to ([A-Za-z ]+) - Category - ([
                    > 0-9]+|NV|TT)")
                    = local dummy = ustrregexm(`"Exports from Australia to Austria - Category - 01 - Agri
                    > culture, hunting and rel"',"from ([A-Za-z ]+) to ([A-Za-z ]+) - Category - ([0-9]+|
                    > NV|TT)")
                    - local from = ustrregexs(1)
                    - local to = ustrregexs(2)
                    - local cat = ustrregexs(3)
                    - local myname = strtoname("`from'`to'`cat'")
                    = local myname = strtoname("AustraliaAustria01")
                    - rename `x' `=strtoname("`myname'")'
                    = rename ExportsfromAustraliatoAustri AustraliaAustria01
                    ------------------------------------------------------------------ begin rename ---
                    - version 12
                    - mata: rename_cmd("`3'")
                    = mata: rename_cmd("")
                    -------------------------------------------------------------------- end rename ---
                    - }

                    It indeed changed a variable. I'm confused I couldn't finde the second variable Exports from Australia to Austria - Category - 011 in the log file. Right after the code above there is:

                    local c:variable label `x'
                    = local c:variable label ExportsfromAustraliatoBelgiu
                    - local dummy = ustrregexm(`"`c'"',"from ([A-Za-z ]+) to ([A-Za-z ]+) - Category - ([
                    > 0-9]+|NV|TT)")
                    = local dummy = ustrregexm(`"Exports from Australia to Belgium - Category - 01 - Agri
                    > culture, hunting and rel"',"from ([A-Za-z ]+) to ([A-Za-z ]+) - Category - ([0-9]+|
                    > NV|TT)")
                    - local from = ustrregexs(1)
                    - local to = ustrregexs(2)
                    - local cat = ustrregexs(3)
                    - local myname = strtoname("`from'`to'`cat'")
                    = local myname = strtoname("AustraliaBelgium01")
                    - rename `x' `=strtoname("`myname'")'
                    = rename ExportsfromAustraliatoBelgiu AustraliaBelgium01
                    ------------------------------------------------------------------ begin rename ---
                    - version 12
                    - mata: rename_cmd("`3'")
                    = mata: rename_cmd("")
                    -------------------------------------------------------------------- end rename ---
                    - }


                    So it changed for the first for Belgium. And then it started to change variables for Canada. In this case it changed all of them


                    local c:variable label `x'
                    = local c:variable label ExportsfromAustraliatoCanada
                    - local dummy = ustrregexm(`"`c'"',"from ([A-Za-z ]+) to ([A-Za-z ]+) - Category - ([
                    > 0-9]+|NV|TT)")
                    = local dummy = ustrregexm(`"Exports from Australia to Canada - Category - 01 - Agric
                    > ulture, hunting and rela"',"from ([A-Za-z ]+) to ([A-Za-z ]+) - Category - ([0-9]+|
                    > NV|TT)")
                    - local from = ustrregexs(1)
                    - local to = ustrregexs(2)
                    - local cat = ustrregexs(3)
                    - local myname = strtoname("`from'`to'`cat'")
                    = local myname = strtoname("AustraliaCanada01")
                    - rename `x' `=strtoname("`myname'")'
                    = rename ExportsfromAustraliatoCanada AustraliaCanada01
                    ------------------------------------------------------------------ begin rename ---
                    - version 12
                    - mata: rename_cmd("`3'")
                    = mata: rename_cmd("")
                    -------------------------------------------------------------------- end rename ---
                    - }
                    - local c:variable label `x'
                    = local c:variable label TI
                    - local dummy = ustrregexm(`"`c'"',"from ([A-Za-z ]+) to ([A-Za-z ]+) - Category - ([
                    > 0-9]+|NV|TT)")
                    = local dummy = ustrregexm(`"Exports from Australia to Canada - Category - 011 - Crop
                    > s; gardening; horticultu"',"from ([A-Za-z ]+) to ([A-Za-z ]+) - Category - ([0-9]+|
                    > NV|TT)")
                    - local from = ustrregexs(1)
                    - local to = ustrregexs(2)
                    - local cat = ustrregexs(3)
                    - local myname = strtoname("`from'`to'`cat'")
                    = local myname = strtoname("AustraliaCanada011")
                    - rename `x' `=strtoname("`myname'")'
                    = rename TI AustraliaCanada011


                    and so on

                    Might it be the case that in local c only for Canada it takes the whole name?


                    local c:variable label `x'
                    = local c:variable label ExportsfromAustraliatoCanada


                    - foreach x of varlist `exp'-`lastvar'{
                    = foreach x of varlist ExportsfromAustraliatoAustri ExportsfromAustraliatoBelgiu Expo
                    > rtsfromAustraliatoCanada-ADJ{
                    - local c:variable label `x'
                    = local c:variable label ExportsfromAustraliatoAustri


                    - local c:variable label `x'
                    = local c:variable label ExportsfromAustraliatoBelgiu

                    It seems to me that this might be the reason that it changes only all variables for Canada.




                    When I use the code:

                    Code:
                    foreach x of varlist ExportsfromAustraliatoAustri-ADJ{
                    local c:variable label `x'
                    local dummy = ustrregexm(`"`c'"',"from ([A-Za-z ]+) to ([A-Za-z ]+) - Category - ([0-9]+|NV|TT)")
                    local from = ustrregexs(1)
                    local to = ustrregexs(2)
                    local cat =  ustrregexs(3)
                    local myname = strtoname("`from'`to'`cat'")
                        rename `x' `=strtoname("`myname'")'            
                    }
                    it changes all variables. However, I can't imagine I'd do it manually for each of 234 files

                    Comment


                    • #11
                      I'm not sure I follow for which variables exactly the code is not doing the job. Perhaps you can import one of the files that is creating a problem (e.g. U:\CEPII CHELEM\CHELEM ISIC\All\AUS_AUT BEL CAN.xlsx), list all the variables and their labels using

                      Code:
                      des, fullnames
                      and then highlight the ones which are not being renamed correctly?

                      Comment


                      • #12
                        Here is the table from the log file. Since the maximum of the charecters is exceeded I had to delete most of the variables. It renamed for Australia and Belgium only the first variables. For Canada all.

                        Code:
                        -------------------------------------------------------------------------------------
                        
                        .
                        end of do-file
                        
                        
                        . des,fullnames
                        
                        Contains data
                        Observations: 54
                        Variables: 790
                        -------------------------------------------------------------------------------------
                        Variable Storage Display Value
                        name type format label Variable label
                        -------------------------------------------------------------------------------------
                        period int %10.0g period
                        AustraliaAustria01
                        double %10.0g Exports from Australia to Austria -
                        Category - 01 - Agriculture, hunting
                        and rel
                        C double %10.0g Exports from Australia to Austria -
                        Category - 011 - Crops; gardening;
                        horticult
                        D double %10.0g Exports from Australia to Austria -
                        Category - 0111 - Cereals and other
                        crops n.
                        .................................................. .................................................. ..............
                        
                        JD double %10.0g Exports from Australia to Austria -
                        Category - TT - Total, million US
                        dollars (C
                        AustraliaBelgium01
                        double %10.0g Exports from Australia to Belgium -
                        Category - 01 - Agriculture, hunting
                        and rel
                        JF double %10.0g Exports from Australia to Belgium -
                        Category - 011 - Crops; gardening;
                        horticult
                        .................................................. .................................................. ...........
                        
                        
                        TG double %10.0g Exports from Australia to Belgium -
                        Category - TT - Total, million US
                        dollars (C
                        AustraliaCanada01
                        double %10.0g Exports from Australia to Canada -
                        Category - 01 - Agriculture, hunting
                        and rela
                        
                        .................................................. .................................................. ...
                        
                        AustraliaCanadaTT
                        double %10.0g Exports from Australia to Canada -
                        Category - TT - Total, million US
                        dollars (CE
                        -------------------------------------------------------------------------------------
                        Sorted by:
                        Note: Dataset has changed since last saved.
                        
                        .
                        end of do-file
                        
                        
                        -------------------------------------------------------------------------------------
                        I think the problem is that Stata doesn't take the whole name when renaming variables for Austria and Canada. When running a loop with set trace on I've noticed something like that:


                        Code:
                        local c:variable label `x'
                        = local c:variable label ExportsfromAustraliatoCanada
                        
                        
                        - foreach x of varlist `exp'-`lastvar'{
                        = foreach x of varlist ExportsfromAustraliatoAustri ExportsfromAustraliatoBelgiu Expo
                        > rtsfromAustraliatoCanada-ADJ{
                        - local c:variable label `x'
                        = local c:variable label ExportsfromAustraliatoAustri
                        
                        
                        - local c:variable label `x'
                        = local c:variable label ExportsfromAustraliatoBelgiu
                        Last edited by sladmin; 28 Aug 2023, 08:32. Reason: anonymize original poster

                        Comment


                        • #13
                          Guest,

                          thanks, this would be much easier to read if the output were enclosed within CODE tags
                          Last edited by sladmin; 28 Aug 2023, 08:32. Reason: anonymize original poster

                          Comment


                          • #14
                            I'm sorry. I just edited my post.

                            Comment


                            • #15
                              Here is one small additional thing you could try:
                              Code:
                              unab exp: Exportsfrom*
                              local exp: word 1 of `exp'
                              The rest of the code (both before and after the unab exp: Exportsfrom* line) all remains the same.

                              Comment

                              Working...
                              X