Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating a loop for generating multiple datasets

    Hi all,
    I am new here and somewhat new to Stata. I have a panel dataset for 31 countries with 14 indicators for each. I have a variable "countrycode" and want to create a new dataset for each "countrycode" keeping the 14 variables that are relevant to each using a loop. I tried to save each as a separate dta but want I really want is to save each country's file as a separate excel sheet in the same excel. Here is what I attempted to do but only got a blank dataset with just the correct headings. All help is welcomed! Huge thanks!

    import excel using "${output}/dataset.xlsx", clear first sheet("ALLCOUNTRIES") /*I already set a global "output"/

    #delimit ;
    local countrylist "
    ARG
    ATG
    BLZ
    BHS
    BOL
    BRA
    CHL
    COL
    CRI
    DMA
    DOM
    ECU
    GRD
    GTM
    GUY
    HTI
    HND
    JAM
    KNA
    LCA
    MEX
    NIC
    PAN
    PER
    PRY
    SLV
    SXM
    SUR
    TTO
    URY
    VCT


    " ;
    #delimit cr

    foreach var of varlist countrycode {
    forvalues i = 1/31 {
    keep if countrycode == "`i'"
    }


    }


    save "${output}/`i'_countrydata.dta", replace

  • #2
    I have a panel dataset for 31 countries with 14 indicators for each.
    So show example data from that (at least two different countries and at least a couple of the indicators) and we can see what you're actually working with. Just the description does not provide enough information to actually help you.

    Be sure to use the -dataex- command when you post back with the example data. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data. (Actually, in this particular situation, -dataex- is the only way to provide all of the information necessary for this task.)

    Comment


    • #3
      Appreciate the guidance! Here it goes Clyde Schechter :

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input str3 countrycode str20 indicatorcode double(yy2017 yy2018 yy2019 lastvalue) int lastyear
      "ARG" "SE.TER.GRAD.FE.SI.ZS"                  .                 .                  .                  .    .
      "ARG" "SL.UEM.NEET.FE.ZS"      22.7800006866455  22.4899997711182   21.6900005340576 23.670000076293945 2020
      "ARG" "SL.UEM.NEET.MA.ZS"      15.7799997329712  15.5299997329712   16.8299999237061                 21 2020
      "ARG" "SE.SEC.CMPT.LO.FE.ZS"   93.5797500610352  94.3190536499023   96.7129592895508  96.71295928955078 2019
      "ARG" "SE.SEC.CMPT.LO.MA.ZS"   86.1833801269531   87.534782409668   90.4758605957031  90.47586059570313 2019
      "ARG" "SP.ADO.TFRT"                      62.782           62.5654            62.3488 62.132198333740234 2020
      "ARG" "SG.VAW.1549.ZS"                        .                 .                  .                  .    .
      "ARG" "SL.EMP.VULN.FE.ZS"    19.000000476837187 19.96000009775157 21.010000526905042 21.010000228881836 2019
      "ARG" "SL.EMP.VULN.MA.ZS"    23.320000231265986 22.83999931812287 23.980000615119973  23.98000144958496 2019
      "ARG" "IC.WEF.LLCO.FE.ZS"                     .                 .                  .                  .    .
      "ARG" "FX.OWN.TOTL.FE.ZS"       50.757251739502                 .                  .   50.7599983215332 2017
      "ARG" "FX.OWN.TOTL.MA.ZS"      46.4663772583008                 .                  . 46.470001220703125 2017
      "ARG" "SG.TIM.UWRK.FE"                        .                 .                  .                  .    .
      "ARG" "SG.TIM.UWRK.MA"                        .                 .                  .                  .    .
      end
      Last edited by Daniel McAdams; 15 Nov 2022, 07:47.

      Comment


      • #4
        Your command
        Code:
        save "${output}/`i'_countrydata.dta", replace
        needs to be inside the inner forvalues loop.

        I also don't see what your outer loop
        Code:
        foreach var of varlist countrycode {
        is achieving, since `var' would only take one value, countrycode, and you don't even use `var' anywhere. I would remove that outer loop.

        Comment


        • #5
          Hemanshu Kumar gives excellent advice in response to your question.

          As an aside, I think you will find the resulting wide-layout data sets difficult to work with. In Stata, almost everything you do with panel data requires that the data be in long layout. So I suggest that you make this transformation while you are looping through. After you import the full data set into Excel, and before you start processing it country by country, I suggest you do the following:
          Code:
          reshape long yy, i(countrycode indicatorcode) j(year)
          rename yy value
          This will convert the entire data set into a long layout, and the subsequent partition into country-specific data sets will also be long. You will find these much more amenable to Stata data management and analysis than the wide layout of the original spreadsheet.

          Comment


          • #6
            Clyde, your code has worked perfectly--thank you. I followed Hemanshu's code after changing the data layout and ran:
            #delimit ;
            local countries "
            ARG
            ATG
            BLZ
            BHS
            BOL
            BRA
            CHL
            COL
            CRI
            DMA
            DOM
            ECU
            GRD
            GTM
            GUY
            HTI
            HND
            JAM
            KNA
            LCA
            MEX
            NIC
            PAN
            PER
            PRY
            SLV
            SXM
            SUR
            TTO
            URY
            VCT


            " ;
            #delimit cr



            forvalues i = 1/31 {
            keep if countrycode == "`i'"
            save "${output}/`local'_LAC.dta", replace
            }

            But got only 1 file without any observations. Could anyone please point out what I am doing wrong?
            And if you could also help me find the code to save each "countrycode" as a separate sheet under ${output}/dataset.xlsx

            Comment


            • #7
              Your countrycode are actually three-alphabet codes, while when you do
              Code:
              keep if countrycode == "`i'"
              , you are asking Stata to retain observations for which countrycode is "1", "2", "3", etc. So you end up with no observations.

              You might need something like this:

              Code:
              forvalues i = 1/31 {
                  local ctry: word `i' of "`countries'"
                  keep if countrycode == "`ctry'"
                  save "${output}/`ctry'_LAC.dta", replace
              }
              where the local macro countries is as you defined it in #6.

              Comment


              • #8
                Oh I see the issue--thanks so much Hemanshu. I gave it another try with your code but I got:

                (10,878 observations deleted)
                (dataset contains 0 observations)
                (file C:\Users\name\Documents\stataoutput/
                ARG ATG BLZ BHS BOL BRA CHL COL CRI DMA
                DOM ECU GRD GTM GUY HTI HND JAM KNA LCA
                MEX NIC PAN PER PRY SLV SXM SUR TTO URY
                VCT _LAC.dta not found)

                Could it better if I export each file as an excel sheet under ${output}/dataset.xlsx?
                I gave this a try but was unsucessful:

                forvalues i = 1/31 {
                local ctry: word `i' of "`countries'"
                keep if countrycode == "`ctry'"
                export excel countrycode using ${output}/scorecard_data.xlsx if countrycode =="`ctry', ///
                sheet ("`ctry'") sheetreplace
                }

                Comment


                • #9
                  Ah, just remove the quotation marks from your definition of the local macro countries and also from my definition of ctry.

                  And no, exporting to Excel or saving as a Stata dataset is not the issue here. You can do whichever you prefer, but it is unrelated to the issue at hand. But I did realise another problem with your code -- you need a preserve / restore:

                  Code:
                  forvalues i = 1/31 {
                      local ctry: word `i' of `countries'
                      preserve
                          keep if countrycode == "`ctry'"
                          save "${output}/`ctry'_LAC.dta", replace
                      restore
                  }

                  Comment


                  • #10
                    Unfortunately I keep getting the same thing as in #8.

                    I ran
                    import excel using "${output}/scorecard_data.xlsx", clear first sheet("ALLCOUNTRIES")

                    reshape long yy, i(countrycode indicatorcode) j(year)
                    rename yy value

                    #delimit ;
                    local countries
                    ARG
                    ATG
                    BLZ
                    BHS
                    BOL
                    BRA
                    CHL
                    COL
                    CRI
                    DMA
                    DOM
                    ECU
                    GRD
                    GTM
                    GUY
                    HTI
                    HND
                    JAM
                    KNA
                    LCA
                    MEX
                    NIC
                    PAN
                    PER
                    PRY
                    SLV
                    SXM
                    SUR
                    TTO
                    URY
                    VCT

                    ;
                    #delimit cr





                    forvalues i = 1/31 {
                    local ctry: word `i' of "`countries'"
                    preserve
                    keep if countrycode == "`ctry'"
                    save "${output}/`ctry'_LAC.dta", replace
                    restore
                    }

                    Comment


                    • #11
                      I had some similar data. This worked for me.

                      Code:
                      levelsof country, local(levels) 
                        foreach c of local levels {
                          export excel * using countryexport if country=="`c'", sheet("`c'", modify) 
                      }

                      Comment


                      • #12
                        The problem is with
                        Code:
                        local ctry: word `i' of "`countries'"
                        which needs to be
                        Code:
                        local ctry: word `i' of `countries'
                        Note that when you put `countries' inside "", Stata's -:word `i' of- function treats it as a single word. So when you then check if countrycode == "`ctry'", it is always false because there is no value of country code that is all of the country codes glommed together. Moreover, because `ctry' ends up taking on this same value each time through the loop, the name of the file never changes either. So that's why you get one (misnamed) file with nothing in it.

                        Comment


                        • #13
                          Unfortunately I keep getting the one empty file after getting rid of the " ". I think it is because Stata is not recognizing my local countries in my forval loop.
                          To checked, I ran:
                          tab countrycode if countrycode == "`countries'"
                          I got:
                          no observations

                          Comment


                          • #14
                            I think it is because Stata is not recognizing my local countries in my forval loop.
                            To checked, I ran:
                            tab countrycode if countrycode == "`countries'"
                            I got:
                            no observations
                            This situation where a local macro isn't recognized usually arises when people run their code line by line or in blocks. When you do that, any local macro that is defined in one part of the run, goes out of existence once that part is finished. So if you are running the local macro definition separately from the loop that uses it, when you get to that loop, it no longer exists. When working with local macros you must run all the code from the definition of the local macro through any uses made of it in one fell swoop.

                            That said, your test of that by runing -tab countrycode if countrycode == "`countries'"- does not actually settle the issue. Remember that, assuming `countries' is still in existence, its value is "ARG ATZ BLG ..." There is no country whose countrycode value is "ARG ATZ BLG ..." so you would get the "no observations" result anyway. What would work better in this sitation is -tab countrycode if strpos("`countries'", countrycode)-, which should give you a tabulation that includes every value of countrycode.
                            Last edited by Clyde Schechter; 15 Nov 2022, 18:55.

                            Comment


                            • #15
                              Daniel McAdams there is a discrepancy between the solution I offered in #9 and what you implemented in #10 -- the pair of quotation marks in
                              Code:
                              local ctry: word `i' of "`countries'"
                              This is exactly the problem pointed out in #12 as well. Please use the exact solution in #9, and it should work fine.

                              Comment

                              Working...
                              X