loop over several dta files

Sara Karel

Join Date: May 2014

Posts: 23
#1

loop over several dta files

29 May 2014, 18:08

Hi. I am new at trying to loop files so, my question may be trivial but I have tried to search for answers and not found much help. I appreciate your help and feedback.

I have 60 countries dta file that look like this:

Angola_2006_2010_Panel.dta
Colombia_2006_2010_Panel.dta
India_2007_2012_Panel.dta
.
.

And, I want to try to estimate the levpet in-built stata command and know how to do it individually for each country. This is the code for that after cleaning some data, replacing zeros, creating logs and so on.

tsset id year
keep id year lnr lnv lnm lnl lnk lni lne lna M R V L K I E A

levpet lnv , free( lnl ) proxy( lne ) capital( lnk ) valueadded

I am really lost on how to create a simple code that let me read all these files with a loop instead of having to do all the countries individually. I have tried to append files but for some countries, I get error messages. Is there a way that will allow me to do so?

Thanks for your help.

-Sara
Tags: None
Sarah Edgington

Join Date: Apr 2014

Posts: 284
#2

29 May 2014, 18:27

What error message do you get when you append files?
Comment
Sara Karel

Join Date: May 2014

Posts: 23
#3

29 May 2014, 18:31

That a variable title (a1) is str10 in using data. Or the variable title ( d1a1x) is byte is using data. it works fine for some files, but not for all.
Comment
Sarah Edgington

Join Date: Apr 2014

Posts: 284
#4

29 May 2014, 19:38

You're going to need to do some work on your datasets to make sure that all the common variables are of the same type across all the datasets.
That is, a1 needs to be string in all the datasets to be able to append them. See help destring and help tostring for more information on changing variable types.

Alternatively, you may be able to get away with just keeping the variables you need for the analysis.

That would look something like

Code:

use Angola_2006_2010_Panel.dta keep id year lnr lnv lnm lnl lnk lni lne lna M R V L K I E A append using Colombia_2006_2010_Panel.dta keep id year lnr lnv lnm lnl lnk lni lne lna M R V L K I E A append using India_2007_2012_Panel.dta

As long as your main variables of interest are all of the same type you shouldn't have any problem. Of course if you decide to do additional analyses you may find there are other variables you needed.
Comment
Sara Karel

Join Date: May 2014

Posts: 23
#5

29 May 2014, 20:39

Thanks. I will try to destring the main variables of interest and see if that works. However, is there a way to use the forval /foreach loop command here? Otherwise, this will take me a long time to get the results with 60 datasets.
Comment

Sergiy Radyakin

Join Date: Apr 2014
Posts: 1867

29 May 2014, 22:36

You don't have to append the files to run regressions (individual by country), but you could with a code similar to below. Best, Sergiy Radyakin

Code:

clear all

local folder "C:/temp/"
local vars "price weight length"
local files "`c(Mons)'" // lazy list

// data preparation
foreach f in `files' {
  sysuse auto, clear
  save "`folder'`f'.dta", replace
}

//simulate problem with different data types for irrelevant variables
generate mstr=string(mpg)
drop mpg
rename mstr mpg
save, replace
// data preparation complete. TS should have written the code above.
count  // 74

local w1 `"`: word 1 of `files''"'
use "`folder'`w1'.dta", clear
foreach f in `files' {
  if (`"`f'"'==`"`w1'"') continue
  append using `"`folder'`f'"'
  keep `vars'
}

count  //74*12=888

Comment

Richard Williams

Join Date: Apr 2014

Posts: 5024
#7

30 May 2014, 06:50

I agree with Sarah E. that it sounds like part of the problem is that the files aren't clean, e.g. in some files variables are strings while in others they aren't. Personally, I would try to get the 60 files cleaned up first. Then it would probably be easy to do what you want. Barring that, you could probably tweak Sergiy's code to fix things as needed, e.g. you could check to see if a variable is string and if so convert it to numeric.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://academicweb.nd.edu/~rwilliam/
Comment
Sara Karel

Join Date: May 2014

Posts: 23
#8

30 May 2014, 10:25

Thanks. I will try to see where I get on the suggestions
Comment
Juan Miranda

Join Date: Mar 2018

Posts: 31
#9

09 Feb 2019, 05:34

Hi all. Good Morning.
I have been looping to generate frequency tables with the fre command over 14 datasets.
Everything goes ok, but sometimes there is some dataset that does not have that variable and the loop stops however, it's not a problem.
What I want is to identify each output to which database it belongs, so that I can identify the outputs faster.
I copy the loop that I use and I would like some suggestions of what else I should include to get what I need.
Thanks in advance.
Juan.

Code:

cd "C:\statadatasets" local i : dir "C:\statadatasets" files "*.dta" foreach file in `i' { use `file', clear capture noisily fre mt }
Comment

Announcement

loop over several dta files

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment