Trouble using commands in loop over

Amie Osborn

Join Date: Sep 2015

Posts: 27
#1

Trouble using commands in loop over

16 Sep 2015, 10:24

Hello Statalist users,

I am using Stata 13.1 and have two questions regarding using loop over for other commands.

Background: The downloaded data was an Excel spreadsheet with each US State on a separate sheet. I was able to remove the first two rows which had unnecessary information by starting in a specific cell. I then used the firstrow command to import the variable names. However, the first column took the variable name based on which State the sheet was regarding since the first cell had the state name. Below is the current working code.

Code:

xls2dta, save ("C:\filelocation") sheets(2/52): import excel "C:\excelfile", cellrange(A3) firstrow clear *******Loop over all Files****************** forvalues i=2/52 { use "C:\filename_`i'.dta" drop in 1/2 **Dates to become labels** foreach var of varlist _all { local label : variable label `var' local new_name = upper(strtoname("`label'")) rename `var' `new_name' } rename (_*) (y*) save, replace }

Problem 1) I need to drop missing observations as there are blank rows.I originally used the code below to drop variables, but this would only work on one data file as the other variables are not named "Alabama". Is there another way to drop the missing observations while using the loop over code since the first variable name is different for every data file?
The other current columns or variables are years and some states have different years available. For instance, Alabama has 1924-2014 while Alaska has 1960-2014. I am hesitant to use a drop missing observation with the year variables as sometimes years are randomly not available.

Code:

drop if missing( ALABAMA)

Problem 2) I want to generate an additional variable "STATE" in order to transpose or reshape the data files so they will append appropriately. However, the only place each file has the name of the state is the first variable name. Is there a way to use the variable name as a value in a newly generated variable so I can put it into the loop over command as well?

My only progress with this problem was using the information from http://www.statalist.org/forums/foru...abels-in-xpose where I tried the following code on one data file:

Code:

encode ALABAMA, gen(_CATEGORY) tempfile labeler label save _CATEGORY using 'labeler'

Unfortunately, that did not even work with one single file as the "invalid 'labeler'" error appeared.

Thank you in advance for any information and advice.

Amie
Tags: None
Friedrich Huebler

Join Date: Apr 2014

Posts: 1053
#2

16 Sep 2015, 11:12

I cannot answer your first question. The invalid 'labeler' error in your second problem is caused by a wrong quotation mark in the label save command.

Code:

label save _CATEGORY using `labeler'
Comment
Amie Osborn

Join Date: Sep 2015

Posts: 27
#3

16 Sep 2015, 11:35

Friedrich,

Great! That makes sense! Now if only I can figure out a way to include it in the loop with different variable names on each data set!
Thank you for the clarification!

Amie
Comment
Friedrich Huebler

Join Date: Apr 2014

Posts: 1053
#4

16 Sep 2015, 12:03

It would be easier to help if you could show us an excerpt from your data. You can use dataex from SSC for this. Verbal descriptions of someone's data are not always easy to understand.
Comment

Amie Osborn

Join Date: Sep 2015
Posts: 27

16 Sep 2015, 12:20

Great! Thank you for introducing me to dataex command.

Here is an excerpt of the data from one file. The problem is there are multiple files like this, but the first variable is based on which state that file represents.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str500 ALABAMA str7(y1960 y1961 y1962)
"Cotton"                                          "743326"  "632695"  "698271" 
"Feed crops"                                      "108073"  "94093"   "85971"  
"Food grains"                                     "10025"   "12502"   "7787"   
"Fruits and nuts"                                 "49024"   "71884"   "28538"  
"Oil crops"                                       "145564"  "154904"  "153578" 
"Tobacco"                                         "2164"    "2417"    "2449"   
"Vegetables and melons"                           "88844"   "80688"   "92068"  
"All other crops"                                 "62560"   "146677"  "141446" 
"Home consumption"                                "108130"  "102581"  "88325"  
"Inventory adjustment"                            "2275"    "24389"   "-86119" 
""                                                ""        ""        ""       
"Value of animals and products production"        "1878668" "1908306" "2016487"
"Animals and products cash receipts"              "1722785" "1734412" "1848329"
"Dairy products, Milk"                            "217647"  "217014"  "213577" 
"Meat animals"                                    "706246"  "717454"  "697451" 
"Miscellaneous livestock"                         "9580"    "9458"    "8836"   
"Poultry and eggs"                                "789313"  "790485"  "928465" 
"Home consumption"                                "173202"  "144644"  "121599" 
"Inventory adjustment"                            "-17318"  "29250"   "46559"  
""                                                ""        ""        ""       
"Farm-related income"                             "180875"  "206782"  "225297" 
"Forest products sold"                            "94908"   "84770"   "80025"  
"Gross imputed rental value of farm dwellings"    "163388"  "189429"  "206967" 
"Machine hire and customwork"                     "17487"   "17353"   "18330"  
"Other farm income"                               "NA"      "NA"      "NA"     
"Total commodity insurance indemnities"           "NA"      "NA"      "NA"     
"Net cash rent received by operator landlords 2/" "NA"      "NA"      "NA"     
""                                                ""        ""        ""       
"Value of agricultural sector production"         "3379527" "3437919" "3454097"
""                                                ""        ""        ""       
"Intermediate product expenses 1/"                "1569126" "1615306" "1722039"
end

Comment

Friedrich Huebler

Join Date: Apr 2014

Posts: 1053
#6

16 Sep 2015, 12:28

What's the name of the first variable for states with a name that consists of more than one word, e.g. New Mexico?
Comment
Amie Osborn

Join Date: Sep 2015

Posts: 27
#7

16 Sep 2015, 12:34

All variables are the states name in uppercase. Those states with multiple words are conjoined by an underscore.
Comment
Friedrich Huebler

Join Date: Apr 2014

Posts: 1053
#8

16 Sep 2015, 13:59

If the first variable is always the name of the state you can identify the name of that variable with the commands below.

Code:

ds local statevar = word("`r(varlist)'",1) drop if missing(`statevar')
Comment

Robert Picard

Join Date: Mar 2014
Posts: 1536

16 Sep 2015, 16:08

Congratulations Amie on being the first to use the new version of dataex. What you are trying to do is a bit tricky because you want to transpose data and combine multiple datasets at the same time. Starting from your data example, I created data for two states as follows (note that some variables appear only in one or the other):

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str500 ALABAMA str7(y1960 y1961 y1962)
"Cotton"                                          "743326"  "632695"  "698271" 
"Feed crops"                                      "108073"  "94093"   "85971"  
"Food grains"                                     "10025"   "12502"   "7787"   
"Fruits and nuts"                                 "49024"   "71884"   "28538"  
""                                                ""        ""        ""       
"Farm-related income"                             "180875"  "206782"  "225297" 
"Machine hire and customwork"                     "17487"   "17353"   "18330"  
"Other farm income"                               "NA"      "NA"      "NA"     
""                                                ""        ""        ""       
"Value of agricultural sector production"         "3379527" "3437919" "3454097"
end
save "data_1.dta", replace

clear
input str500 ALASKA str9(y1960 y1961 y1962)
"Cotton"                                          "743326.1"  "632695.1"  "698271.1" 
"Feed crops"                                      "108073.1"  "94093.1"   "85971.1"  
"Food grains"                                     "10025.1"   "12502.1"   "7787.1"   
"Fruits and nuts"                                 "49024.1"   "71884.1"   "28538.1"  
""                                                ""        ""        ""       
"Farm-related income"                             "180875.1"  "206782.1"  "225297.1" 
"Machine hire and customwork"                     "17487.1"   "17353.1"   "18330.1"  
"Other farm income"                               "9991"      "9992"      "9993"     
"This is another one"                              "1"        "2"        "3"       
end
save "data_2.dta", replace

After you run the code above, you will have two datasets saved in the current directory named "data_1.dta" and "data_2.dta". The following code will then convert each dataset to long form and append them to create a new dataset called "`main'". From this main dataset, all observations for a given variable are extracted, the y variable that contains the value is renamed using a translated version of the variable label and the label itself is attached to the newly renamed variable. The observations are then merged by state and year to form a final "results.dta" dataset.

Code:

* convert each dataset to long form and append them all
local nstate 2
forvalues i = 1/`nstate' {
    use "data_`i'.dta", clear
    ds
    local statename : word 1 of `r(varlist)'
    rename `statename' myvar
    drop if trim(myvar) == ""
    compress
    reshape long y, i(myvar) j(year)
    gen state = "`statename'"
    tempfile f`i'
    save "`f`i''"
}
clear
forvalues i = 1/`nstate' {
    append using "`f`i''"
}

* make a list of variables to cycle through
isid myvar state year, sort
levelsof myvar, local(vlabels)
tempfile main
save "`main'"
list, sepby(myvar)

* prepare an empty dataset to merge each variable
clear
gen state = ""
gen year = .
save "results.dta", replace

* cycle over each variable, rename and label and then merge
local i 0
foreach vlabel of local vlabels {
    use "`main'", clear
    keep if myvar == `"`vlabel'"'
    local newname = strtoname(`"`vlabel'"')
    rename y `newname'
    label var `newname' `"`vlabel'"'
    drop myvar
    merge 1:1 state year using "results.dta", nogen
    save "results.dta", replace
}

* reorganize variables and sort
order _all, alpha
order state year
isid state year, sort

* batch convert to numeric
ds state year, not
destring `r(varlist)', replace force
list, sepby(state)

Comment

Amie Osborn

Join Date: Sep 2015

Posts: 27
#10

17 Sep 2015, 11:26

Robert,

Thank you so much! This is wonderful. I ran your test run of the code and it works!

Thank you again for all of your assistance.

Kind Regards,
Amie
Comment

Amie Osborn

Join Date: Sep 2015
Posts: 27

#11

04 Nov 2015, 14:54

Hello Robert and other Statalist Users,

I was able to combine and turn all the data into a functional format thanks to you all! However, I am struggling again with a another form of data.

It is Extremely similar so I have used similar excerpts Robert provided above. I found a way to do it last time but cannot seem to rework with this data set.
There are three things I need to do to the original data set before I can begin merging.

1. Variables are not labeled. The label is the year so I am trying to add the label to additional characters such as "y". (Variable "B" labeled "1960" should actually be labeled "y1960"). This will allow me to use the date in the future. BUT I do not want the first variable which is the state and changes each file to have "y" added. I thought about using a ds command, but the variable name is different in each file.

2. Drop missing values. I tried dropmiss but it doesn't seem to work in a loop. Is this true? I could just drop each individual variable, but I want it to automatically do that in case that variable isn't missing data in other files.

3. Drop first two observations. For some reason this isn't working in my loop either.

Ultimately, I am trying to take the data set provided below and get it to the form that Robert provided previously so I can run the code he provides to do merge all these datasets.

*Please note: this data is the same as the one Robert has above with a little different form. There are 50 states (files) though so I am trying to make this as dynamic as possible. Still using Stata 13.1

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str39 ALABAMA str7(B C D E F)
""                                        "." ""        "." ""        ""      
""                                        "." ""        "." ""        ""      
"Cotton"                                  "." "743326"  "." "632695"  "698271"
"Feed crops"                              "." "108073"  "." "94093"   "85971" 
"Food grains"                             "." "10025"   "." "12502"   "7787"  
"Fruits and nuts"                         "." "49024"   "." "71884"   "28538" 
""                                        "." ""        "." ""        ""      
"Farm-related income"                     "." "180875"  "." "206782"  "225297"
"Machine hire and customwork"             "." "17487"   "." "17353"   "18330" 
"Other farm income"                       "." "NA"      "." "NA"      "NA"    
""                                        "." ""        "." ""        ""      
"Value of agricultural sector production" "." "3379527" "." "3437919" "3454097"
label var ALABAMA "ALABAMA"
label var C "1960"
label var E "1961"
label var F "1962" end
end
save "test_2.dta", replace
 
clear
input str27 ALASKA byte B float C byte D float(E F)
""                            .        . .        .        .
""                            .        . .        .        .
"Cotton"                      . 743326.1 . 632695.1 698271.1
"Feed crops"                  . 108073.1 .  94093.1  85971.1
"Food grains"                 .  10025.1 .  12502.1   7787.1
"Fruits and nuts"             .  49024.1 .  71884.1  28538.1
""                            .        . .        .        .
"Farm-related income"         . 180875.1 . 206782.1 225297.1
"Machine hire and customwork" .  17487.1 .  17353.1  18330.1
"Other farm income"           .     9991 .     9992     9993
"This is another one"         .        1 .        2        3
label var ALASKA "ALASKA"
label var C "1960"
label var E "1961"
label var F "1962"end
end
save "test_3.dta", replace

So far, this is what I have been trying in order to get it to loop through these two test files.

Code:

forvalues i = 2/`=_N'{
    use "test_`i'.dta", clear
    drop in 1/2
    dropmiss
    save,replace
    **Dates to become labels**
    foreach var of varlist _all {
        local label : variable label `var'
        local new_name = upper(strtoname("`label'"))
        rename `var' y`new_name'
}    
        save , replace
        }

Any help would be great!! I then will have to merge, but with the code Robert provided earlier, I am hoping that will work with my new dataset as well.

Thank you in advance.

Amie

Comment

Nick Cox

Join Date: Mar 2014
Posts: 35734

#12

04 Nov 2015, 15:56

Your input code got slightly mangled somehow. I think this was intended:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str39 ALABAMA str7(B C D E F)
""                                        "." ""        "." ""        ""      
""                                        "." ""        "." ""        ""      
"Cotton"                                  "." "743326"  "." "632695"  "698271"
"Feed crops"                              "." "108073"  "." "94093"   "85971" 
"Food grains"                             "." "10025"   "." "12502"   "7787"  
"Fruits and nuts"                         "." "49024"   "." "71884"   "28538" 
""                                        "." ""        "." ""        ""      
"Farm-related income"                     "." "180875"  "." "206782"  "225297"
"Machine hire and customwork"             "." "17487"   "." "17353"   "18330" 
"Other farm income"                       "." "NA"      "." "NA"      "NA"    
""                                        "." ""        "." ""        ""      
"Value of agricultural sector production" "." "3379527" "." "3437919" "3454097"
end
label var ALABAMA "ALABAMA"
label var C "1960"
label var E "1961"
label var F "1962" 
save "test_2.dta", replace
 
clear
input str27 ALASKA byte B float C byte D float(E F)
""                            .        . .        .        .
""                            .        . .        .        .
"Cotton"                      . 743326.1 . 632695.1 698271.1
"Feed crops"                  . 108073.1 .  94093.1  85971.1
"Food grains"                 .  10025.1 .  12502.1   7787.1
"Fruits and nuts"             .  49024.1 .  71884.1  28538.1
""                            .        . .        .        .
"Farm-related income"         . 180875.1 . 206782.1 225297.1
"Machine hire and customwork" .  17487.1 .  17353.1  18330.1
"Other farm income"           .     9991 .     9992     9993
"This is another one"         .        1 .        2        3
end 
label var ALASKA "ALASKA"
label var C "1960"
label var E "1961"
label var F "1962"
save "test_3.dta", replace

For the rest, I see that you are using dropmiss from the Stata Journal.

Following http://www.statalist.org/forums/foru...aging-missings I would encourage you to use missings (SSC) instead.

Having installed that, your manipulation of missing values would become

Code:

 
missings dropvars 
missings dropobs

I like the new syntax much better. I can't test the rest of your code right now. but I know no reason why dropmiss should not work in a loop.

Comment

Robert Picard

Join Date: Mar 2014
Posts: 1536

#13

04 Nov 2015, 20:25

As Nick noted, your example datasets have some problems. You have complicated the problem by having to rename all your variables and by having a mix of numeric and string variables. I use the variable labels to rename the variables, with the first variable being the state name. I assume that variables without labels are not important and can be dropped. I also destring variables at the state level to avoid conflicts when the data is appended. Otherwise, the rest is pretty much as before.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str39 ALABAMA str7(B C D E F)
""                                        "." ""        "." ""        ""      
""                                        "." ""        "." ""        ""      
"Cotton"                                  "." "743326"  "." "632695"  "698271"
"Feed crops"                              "." "108073"  "." "94093"   "85971" 
"Food grains"                             "." "10025"   "." "12502"   "7787"  
"Fruits and nuts"                         "." "49024"   "." "71884"   "28538" 
""                                        "." ""        "." ""        ""      
"Farm-related income"                     "." "180875"  "." "206782"  "225297"
"Machine hire and customwork"             "." "17487"   "." "17353"   "18330" 
"Other farm income"                       "." "NA"      "." "NA"      "NA"    
""                                        "." ""        "." ""        ""      
"Value of agricultural sector production" "." "3379527" "." "3437919" "3454097"
end
label var ALABAMA "ALABAMA"
label var C "1960"
label var E "1961"
label var F "1962"
save "test_2.dta", replace
 
clear
input str27 ALASKA byte B float C byte D float(E F)
""                            .        . .        .        .
""                            .        . .        .        .
"Cotton"                      . 743326.1 . 632695.1 698271.1
"Feed crops"                  . 108073.1 .  94093.1  85971.1
"Food grains"                 .  10025.1 .  12502.1   7787.1
"Fruits and nuts"             .  49024.1 .  71884.1  28538.1
""                            .        . .        .        .
"Farm-related income"         . 180875.1 . 206782.1 225297.1
"Machine hire and customwork" .  17487.1 .  17353.1  18330.1
"Other farm income"           .     9991 .     9992     9993
"This is another one"         .        1 .        2        3
end
label var ALASKA "ALASKA"
label var C "1960"
label var E "1961"
label var F "1962"
save "test_3.dta", replace


* convert each dataset to long form
forvalues i = 2/3 {
    use "test_`i'.dta", clear
    ds, has(varlabel)
    keep `r(varlist)'
    local j 0
    foreach v of varlist `r(varlist)' {
        local j = `j' + 1
        if `j' == 1 {
            rename `v' myvar
            local statename `v'
        }
        else {
            local ylabel : variable label `v'
            rename `v' y`ylabel'
            destring y`ylabel', replace ignore(NA)
        }
    }
    drop if trim(myvar) == ""
    compress
    reshape long y, i(myvar) j(year)
    gen state = "`statename'"
    save "test_`i'_long.dta", replace
}


* append all the long form datasets
clear
forvalues i =  2/3 {
    append using "test_`i'_long.dta"
}
save "combo_long.dta", replace
list, sepby(myvar)


* make a list of variables to cycle through
isid myvar state year, sort
levelsof myvar, local(vlabels)


* prepare an empty dataset to merge each variable
clear
gen state = ""
gen year = .
save "results.dta", replace


* cycle over each variable, rename and label and then merge
local i 0
foreach vlabel of local vlabels {
    use "combo_long.dta", clear
    keep if myvar == `"`vlabel'"'
    local newname = strtoname(`"`vlabel'"')
    rename y `newname'
    label var `newname' `"`vlabel'"'
    drop myvar
    merge 1:1 state year using "results.dta", nogen
    save "results.dta", replace
}


* reorganize variables and sort
order _all, alpha
order state year
isid state year, sort
list, sepby(state)

Comment

Amie Osborn

Join Date: Sep 2015

Posts: 27
#14

06 Nov 2015, 09:14

Robert,

As always, you have helped tremendously. It worked perfectly with all 52 files. I need to remember to convert to a long form first. I think that is my main issue, I keep trying to jump the gun.

Nick,

Thank you for the introduction to missings program. I will be using that in my syntax from now on.

Thank you both!

Amie
Comment

Announcement