Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Loop over string values of a varibale

    Hi all,

    I wonder if Sata has the capability for looping over the string values of a variable. For example,

    City
    ---------
    LA
    Berlin
    London
    New York


    Now, I want to create a loop which starts with LA and then moves to Berlin etc and performs certain calculations in each round of the loop.

    Thanks,
    David

  • #2
    Yes, certainly, and this is well documented. See (e.g.) the help for foreach and http://www.stata.com/support/faqs/data-management/try-all-values-with-foreach/

    P.S. We strongly prefer full real names here and encourage you to change your identifier away from "Str_researcher" to (e.g.) "David <surname>". Many active members here are reluctant to support people with anonymous or cryptic identifiers.

    Comment


    • #3
      Hey

      I have a similar issue, on which I`m vividly trying to find a solution (using STATA 13).

      I have a string variable that looks like the following (has overall 20 or so different labels it may take):

      input str213 Indicator
      "Domestic Material Input (Non-metallic minerals) (Tonnes)"
      "Domestic Material Input (Non-metallic minerals) (Tonnes)"
      "Domestic extraction (Total) (Tonnes)"
      "Domestic extraction (Total) (Tonnes)"


      What I now want to do is:

      Writing a loop which in each step only keeps those observations where the string variable has an identical "value".
      Then I want to save the dataset in which only these values are kept. Then save it naming it by the first letters of each word of the manifestation in Indication.

      For example the first two lines of the example should be saved as: "\\Desktop\DMINmT.dta"
      The second as "\\Desktop\DeTT.dta"

      Then I want to open again my full dataset to be able to run the loop again:

      use "\\Desktop\MF_all.dta"

      Examples on what I currently tried and failed with :

      foreach i in Indicator {
      keep if Indicator == `i'
      save "\\Desktop\`i'.dta"
      use "\\Desktop\MF_all_indicators.dta", clear
      }


      Now, the recent approach I am thinking about is to generate two loops:

      1. Generating a new variable which gets running numbers, which are the same for each unique label in the string variable
      2. Generate a forvalues loop using the numbers from the variable in the first loop


      I`ll continue to try to figure out solutions, and will let know as soon as I found one.
      For any hints or solutions on how to do this in a more efficient coding way I would be very happy

      Thanks and best regards
      Tobi

      Comment


      • #4
        Code:
        ren Indicator indlong
        gen indicator = strtrim(indlong)
        
        
        levelsof indicator, local(levels) 
        foreach l of local levels{
        preserve
        keep if indicator == "`l'"
        save "`l'.dta"
        restore
        }

        Comment


        • #5
          #4 gets most of what you want done, to get the file named the first letter of each word in the string you just need to add a little bit

          Code:
          clear
          
          input str213 Indicator
          "Domestic Material Input (Non-metallic minerals) (Tonnes)"
          "Domestic Material Input (Non-metallic minerals) (Tonnes)"
          "Domestic extraction (Total) (Tonnes)" 
          "Domestic extraction (Total) (Tonnes)" 
          "Another Example"
          end
          expand 20
          
          levelsof Indicator
          foreach i in `r(levels)' {
              preserve
              local sname
              foreach w in `i' {
                  local sname=`"`sname'`=substr(`"`=subinstr(`"`w'"',"(","",1)'"',1,1)'"'
              }
              keep if Indicator==`"`i'"'
              sa `"`sname'.dta"', replace
              restore
          }

          Comment


          • #6
            I don't see any obvious need to save lots of different datasets here. My advice is that you're just creating a burden for yourself. If you explain why you think you want or need that, the reason may be convincing, or you may get further advice on how to tackle the real problem.

            That said, your syntax is some way off and would be a lot simpler as something more like

            Code:
            egen group = group(Indicator) 
            su group, meanonly 
            
            forval i=1/`r(max)' {
               <do something> if group == `i'
            }
            This approach is documented e.g https://www.stata.com/support/faqs/d...-with-foreach/

            What's wrong with your code? I can't test it without example data but the following problems spring to mind.

            1. Testing for exact equality with literal strings requires " " around those values.

            2. foreach i in varname does not give you a loop over the distinct values of varname Perhaps that guess is based on knowing what some other software does with a loosely similar construct. The syntax you have gives you a loop that is legal in this case but will loop just once. Your code would be legal but yield only

            Code:
            keep if Indicator == Indicator
            which is true for the entire dataset. This error means that #1 doesn't bite, but it would with other code.

            3. Putting local macro references after backslashes is explicitly warned against as a source of bugs. See e.g.

            [U] 18.3.11 within https://www.stata.com/manuals/u18.pdf

            http://www.stata-journal.com/sjpdf.h...iclenum=pr0042

            Comment


            • #7
              Hey Jorrit,

              thank you so much for the Code, it worked perfectly
              Thanks also for the other suggestions and comments, I will take them into account, try out and reply asap.

              Thanks and best regards
              Tobi
              Last edited by Tobias Wendler; 08 Feb 2018, 10:51.

              Comment


              • #8
                Hey,

                first of all concerning Nicks argument that I am just creating a burden you are right. I tried to splice the big problem into many small ones, but I should look for the full solution.
                The actual problem looks like this, I have data that looks the following:

                input str74 indicator long(value year) str129 Country
                "Domestic extraction (Fossil fuels) (Tonnes)" 2256 48 "Austria"
                "Domestic extraction (Fossil fuels) (Tonnes)" 2137 49 "Austria"
                "Domestic extraction (Biomass) (Tonnes)" 3508 2 "Austria"
                "Domestic extraction (Biomass) (Tonnes)" 3592 3 "Austria"
                "Domestic extraction (Non-metallic minerals) (Tonnes)" 10759 48 "Finland"
                "Domestic extraction (Non-metallic minerals) (Tonnes)" 10777 49 "Finland"
                "Domestic extraction (Biomass) (Tonnes)" 3949 2 "Finland"
                "Domestic extraction (Biomass) (Tonnes)" 3982 3 "Finland"
                end

                What I actually want is to now get according to the different variations of indicator the value for these as its own variable for each country,
                so that the data would look like the following if we would name each variable by its first letters, e.g. Domestic extraction (Non-metallic minerals) = DeNm:

                Year Country DeNm DeMo DMIFf
                1970 Austria Value of Domestic extraction Non-metallic minerals Value of Domestic extraction Metal ores Value of Domestic Material Input Fossil fuels
                1 970 Finland
                . .
                .

                If I could get this handled within the dataset without having to create several, edit and merge them that would of course be great.
                Thanks in advance for any advice on this issue.



                Now, by trying the suggestions above based on my old approach the following things occurred:

                1) Danials Code
                I got the error code 603, which I have not yet been able to resolve
                Click image for larger version

Name:	Unbenannt.PNG
Views:	1
Size:	41.2 KB
ID:	1430255



                2) Nick`s code

                The code worked and saved me the appropriate files with the names 1 to 20.

                Code:
                egen group = group(Indicator)
                su group, meanonly
                forval i=1/`r(max)' {
                keep if group == `i'
                save "\\Desktop\\`i'.dta", replace
                }

                3) Comments to my old code

                In fact it was the problem I did not write "". However, the second problem also occured in fact meaning with my code I only saved one dataset.

                foreach i in indicator {
                keep if indicator == "`i'"
                save "\\Desktop\\`i'.dta", replace
                }


                I will keep trying and reporting as soon as I come to a new solution.

                Thanks and best regards
                Tobi

                Comment


                • #9
                  Like Nick, I don't quite understand why break-up the dataset. It kind of looks like Tobias wants the data in a wide layout and the contraction of the indicator variable is needed to have manageable variable names.

                  Anyway, here's another way to create the variable name contraction using moss (from SSC). This uses a regular expression pattern to match all cases of a letter "([A-Za-z])" that is preceded by a non-letter "[^A-Za-z-]". The advantage of creating this variable name contraction at the observation level is that you can easily check that each contraction matches only one indicator value. The example then shows the mapping of contraction to indicator value.

                  Finally, I reshape the data to a wide layout. Since each observation in the #8 example was uniquely identified by year and Country, I added an extra observation to show what happens when there's more than one indicator per year and Country.

                  Code:
                  * Example generated by -dataex-. To install: ssc install dataex
                  clear
                  input str52 indicator int value byte year str7 Country
                  "Domestic extraction (Fossil fuels) (Tonnes)"           2256 48 "Austria"
                  "Domestic extraction (Fossil fuels) (Tonnes)"           2222  2 "Austria"
                  "Domestic extraction (Fossil fuels) (Tonnes)"           2137 49 "Austria"
                  "Domestic extraction (Biomass) (Tonnes)"                3508  2 "Austria"
                  "Domestic extraction (Biomass) (Tonnes)"                3592  3 "Austria"
                  "Domestic extraction (Non-metallic minerals) (Tonnes)" 10759 48 "Finland"
                  "Domestic extraction (Non-metallic minerals) (Tonnes)" 10777 49 "Finland"
                  "Domestic extraction (Biomass) (Tonnes)"                3949  2 "Finland"
                  "Domestic extraction (Biomass) (Tonnes)"                3982  3 "Finland"
                  end
                  
                  * to install moss, type: ssc install moss
                  gen s = " " + indicator
                  moss s, match("[^A-Za-z-]([A-Za-z])") regex
                  egen vname = concat(_match*)
                  drop _* s
                  
                  * make sure that the variable name is matched to only 1 indicator
                  bysort vname (indicator): assert indicator[1] == indicator[_N]
                  
                  * show the mapping of vname to indicator
                  by vname: gen tag = _n == 1
                  leftalign    // to install, type: ssc install leftalign 
                  list vname indicator if tag
                  drop tag
                  
                  * convert to a wide layout
                  drop indicator
                  reshape wide value, i(year Country) j(vname) string
                  rename value* *
                  
                  list, sepby(year)
                  And the results from the first listing
                  Code:
                  . list vname indicator if tag
                  
                       +--------------------------------------------------------------+
                       | vname   indicator                                            |
                       |--------------------------------------------------------------|
                    1. | DeBT    Domestic extraction (Biomass) (Tonnes)               |
                    5. | DeFfT   Domestic extraction (Fossil fuels) (Tonnes)          |
                    8. | DeNmT   Domestic extraction (Non-metallic minerals) (Tonnes) |
                       +--------------------------------------------------------------+
                  and the final results
                  Code:
                  . list, sepby(year)
                  
                       +---------------------------------------+
                       | year   Country   DeBT   DeFfT   DeNmT |
                       |---------------------------------------|
                    1. |    2   Austria   3508    2222       . |
                    2. |    2   Finland   3949       .       . |
                       |---------------------------------------|
                    3. |    3   Austria   3592       .       . |
                    4. |    3   Finland   3982       .       . |
                       |---------------------------------------|
                    5. |   48   Austria      .    2256       . |
                    6. |   48   Finland      .       .   10759 |
                       |---------------------------------------|
                    7. |   49   Austria      .    2137       . |
                    8. |   49   Finland      .       .   10777 |
                       +---------------------------------------+
                  
                  .

                  Comment


                  • #10
                    Hey Robert,

                    thank you very much for the awesome suggestion, everything worked perfectly as you suggested and it did exactly what I wanted to do, saving me of all the unnecessary effort.
                    Thank you and Nick so much for this correction.

                    Best wishes
                    Tobi

                    Comment


                    • #11
                      Dear all,

                      I am working with the labor data from different cities where the total number of employees alongside their qualification levels is given for each year. The dataset contains 7 variables, namely; Year, City, Qualification_Level, Total_Number, Total_Men, Total_Women (Qualification_Level is a string variable, containg a gap for some cases) and the dataset contains the information from the 2000 to 2018 period.

                      The dataset looks like:

                      Year City Qualification_Level Total_Number Total_Men Total_Women
                      2018 A None … … …
                      2018 A Undergraduate … … …
                      2018 A Post Graduate … … …
                      2018 A Total … … …
                      2017 A None … … …
                      2017 A Undergraduate … … …
                      2017 A Post Graduate … … …
                      2017 A Total … … …
                      …..
                      .
                      .
                      .
                      2018 B None
                      2018 B Undergraduate
                      2018 B Post Graduate
                      2018 B Total
                      2017 B None
                      2017 B Undergraduate
                      2017 B Post Graduate
                      2017 B Total
                      ….
                      .
                      .
                      .
                      The values of the variables (Total_Number, Total_Men, Total_Women) are Numeric.
                      Now I would like to calculate the “Ratio” of Qualification Level for each City at each Year, by calculating the following ratios: (None/Total), (Undergraduate /Total) and (Post Graduate/Total).
                      I have already tried looping over String values and defining my dataset as a Panel Data, but both approaches were not successful.
                      Would appreciate your comments regarding looping over String Variables in a situation like this.

                      Best Regards

                      Comment


                      • #12
                        Hi All,

                        I am also trying to get Stata (18.5) to loop over the string values of a single variable and for the life of me, I cannot get it to work appropriately. I have reviewed the help for foreach and also the material on http://www.stata.com/support/faqs/da...-with-foreach/, but still not get this to work appropriately. Here is an example of my data:

                        input str5 hosp_name float(qtr glucose_done)
                        "HFMMC" 248 0
                        "HFMMC" 248 0
                        "HFMMC" 248 0
                        "HFMMC" 248 0
                        "PMC" 248 1
                        "HFMMC" 248 0
                        "ABHC" 248 0
                        "PMC" 248 1
                        "HFMMC" 248 1
                        "HFMMC" 248 0
                        end
                        format %tq qtr
                        label values glucose_done label_yn
                        label def label_yn 0 "No", modify
                        label def label_yn 1 "Yes", modify

                        and here is the looping code that I wrote:

                        foreach i in hosp_name {
                        tab glucose_done qtr if hosp_name==`i', col
                        }

                        The code is running, but it is essentially running "tab glucose_done qtr" once for the entire dataset, not one time for each value of hosp_name in the dataset.

                        Any suggestions for how to get this working appropriately would be greatly appreciated.

                        Bradley



                        Comment


                        • #13
                          Your code in #12 has a couple of problems. First is the way you have used -foreach ... in...-. It does not look inside a variable and extract the values. Rather it needs to be followed by a list of the specific values. To loop over the values taken on by the variable hosp_name requires first creating a list of those. That last can then be used in a -foreach ... in...- command, or, better, in a -foreach ... of local ...- loop.

                          Next, you will get into trouble with -if hosp_name == `i'- because this will cause Stata to look for a variable whose name is `i' and compare whether the value of hosp_name is the same as the value of variable `i'. But, of course, variable `i', if you have constructed the loop properly, will not exist, and you will get an error there. [Added: to get Stata to see whether the value of variable hosp_name is the same as the value in `i' requires that you surround `i' in quotes: `"`i'"'.]

                          The reason the loop runs at all is that, as you wrote it, the loop runs exactly once, with the value of i set to hosp_name. The -tab- command is then conditioned on -if hosp_name == hosp_name-, which, of course, is true in every observation. So in that one iteration of the loop, you will get -tab glucose_done qtr- executed for the entire data set.

                          The following code fixes these problems.
                          Code:
                          levelsof hosp_name, local(hosp_names)
                          foreach h of local hosp_names {
                              display "`h'"
                              tab glucose_done qtr if hosp_name == `"`h'"'
                          }
                          I have also added a command to show the name of the hospital before each cross tab, so you can tell which cross tab corresponds to which value of hosp_name.
                          Last edited by Clyde Schechter; 20 Mar 2025, 10:04.

                          Comment


                          • #14
                            Clyde Schechter specifies the correct way to loop, but if all you need is the tabulation above, you don't actually need a loop at all:

                            Code:
                            . bysort hosp_name: tab glucose_done qtr
                            
                            
                            -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                            -> hosp_name = ABHC
                            
                            glucose_do |    qtr
                                    ne |    2022q1 |     Total
                            -----------+-----------+----------
                                    No |         1 |         1
                            -----------+-----------+----------
                                 Total |         1 |         1
                            
                            -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                            -> hosp_name = HFMMC
                            
                            glucose_do |    qtr
                                    ne |    2022q1 |     Total
                            -----------+-----------+----------
                                    No |         6 |         6
                                   Yes |         1 |         1
                            -----------+-----------+----------
                                 Total |         7 |         7
                            
                            -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                            -> hosp_name = PMC
                            
                            glucose_do |    qtr
                                    ne |    2022q1 |     Total
                            -----------+-----------+----------
                                   Yes |         2 |         2
                            -----------+-----------+----------
                                 Total |         2 |         2
                            If you are using Stata 17 or later, you could also just do:

                            Code:
                            . table (glucose_done) (qtr) (hosp_name), nototal
                            
                            hosp_name = ABHC
                            ----------------------
                                         |    qtr 
                                         |  2022q1
                            -------------+--------
                            glucose_done |        
                              No         |       1
                            ----------------------
                            
                            hosp_name = HFMMC
                            ----------------------
                                         |    qtr 
                                         |  2022q1
                            -------------+--------
                            glucose_done |        
                              No         |       6
                              Yes        |       1
                            ----------------------
                            
                            hosp_name = PMC
                            ----------------------
                                         |    qtr 
                                         |  2022q1
                            -------------+--------
                            glucose_done |        
                              Yes        |       2
                            ----------------------
                            Of course, if this was just dummy code and you actually need to do a set of operations for each value of hosp_name, then the loop may be right way to proceed.
                            Last edited by Hemanshu Kumar; 20 Mar 2025, 10:09.

                            Comment


                            • #15
                              Thank you Clyde Schechter for the corrected code and detailed explanation of what was wrong with my original attempt. And Hemanshu Kumar thanks for the reminder on the value of bysort to accomplish this as well.

                              Comment

                              Working...
                              X