Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Trouble using commands in loop over

    Hello Statalist users,

    I am using Stata 13.1 and have two questions regarding using loop over for other commands.

    Background: The downloaded data was an Excel spreadsheet with each US State on a separate sheet. I was able to remove the first two rows which had unnecessary information by starting in a specific cell. I then used the firstrow command to import the variable names. However, the first column took the variable name based on which State the sheet was regarding since the first cell had the state name. Below is the current working code.

    Code:
    xls2dta, save ("C:\filelocation") sheets(2/52): import excel "C:\excelfile", cellrange(A3) firstrow clear
    
    
    *******Loop over all Files******************
    forvalues i=2/52 {
        use "C:\filename_`i'.dta"
        drop in 1/2
    **Dates to become labels**
        foreach var of varlist _all {
            local label : variable label `var'
            local new_name = upper(strtoname("`label'"))
            rename `var' `new_name'
    }    
        rename (_*) (y*)
        save, replace
        }
    Problem 1) I need to drop missing observations as there are blank rows.I originally used the code below to drop variables, but this would only work on one data file as the other variables are not named "Alabama". Is there another way to drop the missing observations while using the loop over code since the first variable name is different for every data file?
    The other current columns or variables are years and some states have different years available. For instance, Alabama has 1924-2014 while Alaska has 1960-2014. I am hesitant to use a drop missing observation with the year variables as sometimes years are randomly not available.
    Code:
    drop if missing( ALABAMA)

    Problem 2) I want to generate an additional variable "STATE" in order to transpose or reshape the data files so they will append appropriately. However, the only place each file has the name of the state is the first variable name. Is there a way to use the variable name as a value in a newly generated variable so I can put it into the loop over command as well?

    My only progress with this problem was using the information from http://www.statalist.org/forums/foru...abels-in-xpose where I tried the following code on one data file:
    Code:
    encode ALABAMA, gen(_CATEGORY)
    tempfile labeler
    label save _CATEGORY using 'labeler'
    Unfortunately, that did not even work with one single file as the "invalid 'labeler'" error appeared.


    Thank you in advance for any information and advice.

    Amie

  • #2
    I cannot answer your first question. The invalid 'labeler' error in your second problem is caused by a wrong quotation mark in the label save command.
    Code:
    label save _CATEGORY using `labeler'

    Comment


    • #3
      Friedrich,

      Great! That makes sense! Now if only I can figure out a way to include it in the loop with different variable names on each data set!
      Thank you for the clarification!

      Amie

      Comment


      • #4
        It would be easier to help if you could show us an excerpt from your data. You can use dataex from SSC for this. Verbal descriptions of someone's data are not always easy to understand.

        Comment


        • #5
          Great! Thank you for introducing me to dataex command.

          Here is an excerpt of the data from one file. The problem is there are multiple files like this, but the first variable is based on which state that file represents.



          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input str500 ALABAMA str7(y1960 y1961 y1962)
          "Cotton"                                          "743326"  "632695"  "698271" 
          "Feed crops"                                      "108073"  "94093"   "85971"  
          "Food grains"                                     "10025"   "12502"   "7787"   
          "Fruits and nuts"                                 "49024"   "71884"   "28538"  
          "Oil crops"                                       "145564"  "154904"  "153578" 
          "Tobacco"                                         "2164"    "2417"    "2449"   
          "Vegetables and melons"                           "88844"   "80688"   "92068"  
          "All other crops"                                 "62560"   "146677"  "141446" 
          "Home consumption"                                "108130"  "102581"  "88325"  
          "Inventory adjustment"                            "2275"    "24389"   "-86119" 
          ""                                                ""        ""        ""       
          "Value of animals and products production"        "1878668" "1908306" "2016487"
          "Animals and products cash receipts"              "1722785" "1734412" "1848329"
          "Dairy products, Milk"                            "217647"  "217014"  "213577" 
          "Meat animals"                                    "706246"  "717454"  "697451" 
          "Miscellaneous livestock"                         "9580"    "9458"    "8836"   
          "Poultry and eggs"                                "789313"  "790485"  "928465" 
          "Home consumption"                                "173202"  "144644"  "121599" 
          "Inventory adjustment"                            "-17318"  "29250"   "46559"  
          ""                                                ""        ""        ""       
          "Farm-related income"                             "180875"  "206782"  "225297" 
          "Forest products sold"                            "94908"   "84770"   "80025"  
          "Gross imputed rental value of farm dwellings"    "163388"  "189429"  "206967" 
          "Machine hire and customwork"                     "17487"   "17353"   "18330"  
          "Other farm income"                               "NA"      "NA"      "NA"     
          "Total commodity insurance indemnities"           "NA"      "NA"      "NA"     
          "Net cash rent received by operator landlords 2/" "NA"      "NA"      "NA"     
          ""                                                ""        ""        ""       
          "Value of agricultural sector production"         "3379527" "3437919" "3454097"
          ""                                                ""        ""        ""       
          "Intermediate product expenses 1/"                "1569126" "1615306" "1722039"
          end

          Comment


          • #6
            What's the name of the first variable for states with a name that consists of more than one word, e.g. New Mexico?

            Comment


            • #7
              All variables are the states name in uppercase. Those states with multiple words are conjoined by an underscore.

              Comment


              • #8
                If the first variable is always the name of the state you can identify the name of that variable with the commands below.
                Code:
                ds
                local statevar = word("`r(varlist)'",1)
                drop if missing(`statevar')

                Comment


                • #9
                  Congratulations Amie on being the first to use the new version of dataex. What you are trying to do is a bit tricky because you want to transpose data and combine multiple datasets at the same time. Starting from your data example, I created data for two states as follows (note that some variables appear only in one or the other):

                  Code:
                  * Example generated by -dataex-. To install: ssc install dataex
                  clear
                  input str500 ALABAMA str7(y1960 y1961 y1962)
                  "Cotton"                                          "743326"  "632695"  "698271" 
                  "Feed crops"                                      "108073"  "94093"   "85971"  
                  "Food grains"                                     "10025"   "12502"   "7787"   
                  "Fruits and nuts"                                 "49024"   "71884"   "28538"  
                  ""                                                ""        ""        ""       
                  "Farm-related income"                             "180875"  "206782"  "225297" 
                  "Machine hire and customwork"                     "17487"   "17353"   "18330"  
                  "Other farm income"                               "NA"      "NA"      "NA"     
                  ""                                                ""        ""        ""       
                  "Value of agricultural sector production"         "3379527" "3437919" "3454097"
                  end
                  save "data_1.dta", replace
                  
                  clear
                  input str500 ALASKA str9(y1960 y1961 y1962)
                  "Cotton"                                          "743326.1"  "632695.1"  "698271.1" 
                  "Feed crops"                                      "108073.1"  "94093.1"   "85971.1"  
                  "Food grains"                                     "10025.1"   "12502.1"   "7787.1"   
                  "Fruits and nuts"                                 "49024.1"   "71884.1"   "28538.1"  
                  ""                                                ""        ""        ""       
                  "Farm-related income"                             "180875.1"  "206782.1"  "225297.1" 
                  "Machine hire and customwork"                     "17487.1"   "17353.1"   "18330.1"  
                  "Other farm income"                               "9991"      "9992"      "9993"     
                  "This is another one"                              "1"        "2"        "3"       
                  end
                  save "data_2.dta", replace
                  After you run the code above, you will have two datasets saved in the current directory named "data_1.dta" and "data_2.dta". The following code will then convert each dataset to long form and append them to create a new dataset called "`main'". From this main dataset, all observations for a given variable are extracted, the y variable that contains the value is renamed using a translated version of the variable label and the label itself is attached to the newly renamed variable. The observations are then merged by state and year to form a final "results.dta" dataset.

                  Code:
                  * convert each dataset to long form and append them all
                  local nstate 2
                  forvalues i = 1/`nstate' {
                      use "data_`i'.dta", clear
                      ds
                      local statename : word 1 of `r(varlist)'
                      rename `statename' myvar
                      drop if trim(myvar) == ""
                      compress
                      reshape long y, i(myvar) j(year)
                      gen state = "`statename'"
                      tempfile f`i'
                      save "`f`i''"
                  }
                  clear
                  forvalues i = 1/`nstate' {
                      append using "`f`i''"
                  }
                  
                  * make a list of variables to cycle through
                  isid myvar state year, sort
                  levelsof myvar, local(vlabels)
                  tempfile main
                  save "`main'"
                  list, sepby(myvar)
                  
                  * prepare an empty dataset to merge each variable
                  clear
                  gen state = ""
                  gen year = .
                  save "results.dta", replace
                  
                  * cycle over each variable, rename and label and then merge
                  local i 0
                  foreach vlabel of local vlabels {
                      use "`main'", clear
                      keep if myvar == `"`vlabel'"'
                      local newname = strtoname(`"`vlabel'"')
                      rename y `newname'
                      label var `newname' `"`vlabel'"'
                      drop myvar
                      merge 1:1 state year using "results.dta", nogen
                      save "results.dta", replace
                  }
                  
                  * reorganize variables and sort
                  order _all, alpha
                  order state year
                  isid state year, sort
                  
                  * batch convert to numeric
                  ds state year, not
                  destring `r(varlist)', replace force
                  list, sepby(state)

                  Comment


                  • #10
                    Robert,

                    Thank you so much! This is wonderful. I ran your test run of the code and it works!

                    Thank you again for all of your assistance.

                    Kind Regards,
                    Amie

                    Comment


                    • #11
                      Hello Robert and other Statalist Users,

                      I was able to combine and turn all the data into a functional format thanks to you all! However, I am struggling again with a another form of data.

                      It is Extremely similar so I have used similar excerpts Robert provided above. I found a way to do it last time but cannot seem to rework with this data set.
                      There are three things I need to do to the original data set before I can begin merging.

                      1. Variables are not labeled. The label is the year so I am trying to add the label to additional characters such as "y". (Variable "B" labeled "1960" should actually be labeled "y1960"). This will allow me to use the date in the future. BUT I do not want the first variable which is the state and changes each file to have "y" added. I thought about using a ds command, but the variable name is different in each file.

                      2. Drop missing values. I tried dropmiss but it doesn't seem to work in a loop. Is this true? I could just drop each individual variable, but I want it to automatically do that in case that variable isn't missing data in other files.

                      3. Drop first two observations. For some reason this isn't working in my loop either.

                      Ultimately, I am trying to take the data set provided below and get it to the form that Robert provided previously so I can run the code he provides to do merge all these datasets.

                      *Please note: this data is the same as the one Robert has above with a little different form. There are 50 states (files) though so I am trying to make this as dynamic as possible. Still using Stata 13.1


                      Code:
                      * Example generated by -dataex-. To install: ssc install dataex
                      clear
                      input str39 ALABAMA str7(B C D E F)
                      ""                                        "." ""        "." ""        ""      
                      ""                                        "." ""        "." ""        ""      
                      "Cotton"                                  "." "743326"  "." "632695"  "698271"
                      "Feed crops"                              "." "108073"  "." "94093"   "85971" 
                      "Food grains"                             "." "10025"   "." "12502"   "7787"  
                      "Fruits and nuts"                         "." "49024"   "." "71884"   "28538" 
                      ""                                        "." ""        "." ""        ""      
                      "Farm-related income"                     "." "180875"  "." "206782"  "225297"
                      "Machine hire and customwork"             "." "17487"   "." "17353"   "18330" 
                      "Other farm income"                       "." "NA"      "." "NA"      "NA"    
                      ""                                        "." ""        "." ""        ""      
                      "Value of agricultural sector production" "." "3379527" "." "3437919" "3454097"
                      label var ALABAMA "ALABAMA"
                      label var C "1960"
                      label var E "1961"
                      label var F "1962" end
                      end
                      save "test_2.dta", replace
                       
                      clear
                      input str27 ALASKA byte B float C byte D float(E F)
                      ""                            .        . .        .        .
                      ""                            .        . .        .        .
                      "Cotton"                      . 743326.1 . 632695.1 698271.1
                      "Feed crops"                  . 108073.1 .  94093.1  85971.1
                      "Food grains"                 .  10025.1 .  12502.1   7787.1
                      "Fruits and nuts"             .  49024.1 .  71884.1  28538.1
                      ""                            .        . .        .        .
                      "Farm-related income"         . 180875.1 . 206782.1 225297.1
                      "Machine hire and customwork" .  17487.1 .  17353.1  18330.1
                      "Other farm income"           .     9991 .     9992     9993
                      "This is another one"         .        1 .        2        3
                      label var ALASKA "ALASKA"
                      label var C "1960"
                      label var E "1961"
                      label var F "1962"end
                      end
                      save "test_3.dta", replace

                      So far, this is what I have been trying in order to get it to loop through these two test files.

                      Code:
                      forvalues i = 2/`=_N'{
                          use "test_`i'.dta", clear
                          drop in 1/2
                          dropmiss
                          save,replace
                          **Dates to become labels**
                          foreach var of varlist _all {
                              local label : variable label `var'
                              local new_name = upper(strtoname("`label'"))
                              rename `var' y`new_name'
                      }    
                              save , replace
                              }

                      Any help would be great!! I then will have to merge, but with the code Robert provided earlier, I am hoping that will work with my new dataset as well.

                      Thank you in advance.

                      Amie

                      Comment


                      • #12
                        Your input code got slightly mangled somehow. I think this was intended:

                        Code:
                        * Example generated by -dataex-. To install: ssc install dataex
                        clear
                        input str39 ALABAMA str7(B C D E F)
                        ""                                        "." ""        "." ""        ""      
                        ""                                        "." ""        "." ""        ""      
                        "Cotton"                                  "." "743326"  "." "632695"  "698271"
                        "Feed crops"                              "." "108073"  "." "94093"   "85971" 
                        "Food grains"                             "." "10025"   "." "12502"   "7787"  
                        "Fruits and nuts"                         "." "49024"   "." "71884"   "28538" 
                        ""                                        "." ""        "." ""        ""      
                        "Farm-related income"                     "." "180875"  "." "206782"  "225297"
                        "Machine hire and customwork"             "." "17487"   "." "17353"   "18330" 
                        "Other farm income"                       "." "NA"      "." "NA"      "NA"    
                        ""                                        "." ""        "." ""        ""      
                        "Value of agricultural sector production" "." "3379527" "." "3437919" "3454097"
                        end
                        label var ALABAMA "ALABAMA"
                        label var C "1960"
                        label var E "1961"
                        label var F "1962" 
                        save "test_2.dta", replace
                         
                        clear
                        input str27 ALASKA byte B float C byte D float(E F)
                        ""                            .        . .        .        .
                        ""                            .        . .        .        .
                        "Cotton"                      . 743326.1 . 632695.1 698271.1
                        "Feed crops"                  . 108073.1 .  94093.1  85971.1
                        "Food grains"                 .  10025.1 .  12502.1   7787.1
                        "Fruits and nuts"             .  49024.1 .  71884.1  28538.1
                        ""                            .        . .        .        .
                        "Farm-related income"         . 180875.1 . 206782.1 225297.1
                        "Machine hire and customwork" .  17487.1 .  17353.1  18330.1
                        "Other farm income"           .     9991 .     9992     9993
                        "This is another one"         .        1 .        2        3
                        end 
                        label var ALASKA "ALASKA"
                        label var C "1960"
                        label var E "1961"
                        label var F "1962"
                        save "test_3.dta", replace
                        For the rest, I see that you are using dropmiss from the Stata Journal.

                        Following http://www.statalist.org/forums/foru...aging-missings I would encourage you to use missings (SSC) instead.

                        Having installed that, your manipulation of missing values would become

                        Code:
                         
                        missings dropvars 
                        missings dropobs
                        I like the new syntax much better. I can't test the rest of your code right now. but I know no reason why dropmiss should not work in a loop.

                        Comment


                        • #13
                          As Nick noted, your example datasets have some problems. You have complicated the problem by having to rename all your variables and by having a mix of numeric and string variables. I use the variable labels to rename the variables, with the first variable being the state name. I assume that variables without labels are not important and can be dropped. I also destring variables at the state level to avoid conflicts when the data is appended. Otherwise, the rest is pretty much as before.

                          Code:
                          * Example generated by -dataex-. To install: ssc install dataex
                          clear
                          input str39 ALABAMA str7(B C D E F)
                          ""                                        "." ""        "." ""        ""      
                          ""                                        "." ""        "." ""        ""      
                          "Cotton"                                  "." "743326"  "." "632695"  "698271"
                          "Feed crops"                              "." "108073"  "." "94093"   "85971" 
                          "Food grains"                             "." "10025"   "." "12502"   "7787"  
                          "Fruits and nuts"                         "." "49024"   "." "71884"   "28538" 
                          ""                                        "." ""        "." ""        ""      
                          "Farm-related income"                     "." "180875"  "." "206782"  "225297"
                          "Machine hire and customwork"             "." "17487"   "." "17353"   "18330" 
                          "Other farm income"                       "." "NA"      "." "NA"      "NA"    
                          ""                                        "." ""        "." ""        ""      
                          "Value of agricultural sector production" "." "3379527" "." "3437919" "3454097"
                          end
                          label var ALABAMA "ALABAMA"
                          label var C "1960"
                          label var E "1961"
                          label var F "1962"
                          save "test_2.dta", replace
                           
                          clear
                          input str27 ALASKA byte B float C byte D float(E F)
                          ""                            .        . .        .        .
                          ""                            .        . .        .        .
                          "Cotton"                      . 743326.1 . 632695.1 698271.1
                          "Feed crops"                  . 108073.1 .  94093.1  85971.1
                          "Food grains"                 .  10025.1 .  12502.1   7787.1
                          "Fruits and nuts"             .  49024.1 .  71884.1  28538.1
                          ""                            .        . .        .        .
                          "Farm-related income"         . 180875.1 . 206782.1 225297.1
                          "Machine hire and customwork" .  17487.1 .  17353.1  18330.1
                          "Other farm income"           .     9991 .     9992     9993
                          "This is another one"         .        1 .        2        3
                          end
                          label var ALASKA "ALASKA"
                          label var C "1960"
                          label var E "1961"
                          label var F "1962"
                          save "test_3.dta", replace
                          
                          
                          * convert each dataset to long form
                          forvalues i = 2/3 {
                              use "test_`i'.dta", clear
                              ds, has(varlabel)
                              keep `r(varlist)'
                              local j 0
                              foreach v of varlist `r(varlist)' {
                                  local j = `j' + 1
                                  if `j' == 1 {
                                      rename `v' myvar
                                      local statename `v'
                                  }
                                  else {
                                      local ylabel : variable label `v'
                                      rename `v' y`ylabel'
                                      destring y`ylabel', replace ignore(NA)
                                  }
                              }
                              drop if trim(myvar) == ""
                              compress
                              reshape long y, i(myvar) j(year)
                              gen state = "`statename'"
                              save "test_`i'_long.dta", replace
                          }
                          
                          
                          * append all the long form datasets
                          clear
                          forvalues i =  2/3 {
                              append using "test_`i'_long.dta"
                          }
                          save "combo_long.dta", replace
                          list, sepby(myvar)
                          
                          
                          * make a list of variables to cycle through
                          isid myvar state year, sort
                          levelsof myvar, local(vlabels)
                          
                          
                          * prepare an empty dataset to merge each variable
                          clear
                          gen state = ""
                          gen year = .
                          save "results.dta", replace
                          
                          
                          * cycle over each variable, rename and label and then merge
                          local i 0
                          foreach vlabel of local vlabels {
                              use "combo_long.dta", clear
                              keep if myvar == `"`vlabel'"'
                              local newname = strtoname(`"`vlabel'"')
                              rename y `newname'
                              label var `newname' `"`vlabel'"'
                              drop myvar
                              merge 1:1 state year using "results.dta", nogen
                              save "results.dta", replace
                          }
                          
                          
                          * reorganize variables and sort
                          order _all, alpha
                          order state year
                          isid state year, sort
                          list, sepby(state)

                          Comment


                          • #14
                            Robert,

                            As always, you have helped tremendously. It worked perfectly with all 52 files. I need to remember to convert to a long form first. I think that is my main issue, I keep trying to jump the gun.

                            Nick,

                            Thank you for the introduction to missings program. I will be using that in my syntax from now on.

                            Thank you both!

                            Amie

                            Comment

                            Working...
                            X