Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Import multiple .txt files and create a .dta file with all of them

    Dear all

    I am working with a dataset that is spread in several .txt files. The files are like AC2002ID.txt, BA2002ID.txt etc where the 2 first letters are state identifiers and then the year (years go from 2002-2014). How can I load all of these .txt files for each state and year , and end up with a single .dta file that groups everything?

    Thank you

  • #2
    Perhaps something along the lines of:

    http://dataservices.gmu.edu/files/St...pend_files.pdf

    Best, Sergiy Radyakin

    Comment


    • #3
      I'll just add this advice to Sergiy's. Of the two approaches outlined in the PDF he links to, I strongly recommend the "Step-by-Step" process. The reason is that in most real world compendia of data, you will find that there are inconsistencies among the files you are sent. The same variable can have different names in each file. Worse, different variables can have the same name in each file. Coding of variables can differ from one file to the next. And particularly with .csv or Excel files, what is a string variable in one file can be numeric in another. The step-by-step approach enables you to stop after each file has been separately brought into Stata, explore each file and run data-cleaning scripts to make all of the files confirm to identical variable naming, coding, data storage types, and formatting. Then you can run the loop to put them altogether.

      If you use the all-at-once approach and there are problems of the type I've mentioned, cleaning up the combined file can be a nightmare, because the fixes needed for data from different files are different and often contradictory.

      The only circumstance under which I would use the all-at-once approach is if I knew for certain that the data in the files are entirely consistent with each other in all of these respects. I would only feel comfortable assuming that if the source of the data were a very high quality data curator whose data sets I had previously worked with and found to meet these standards. It's pretty uncommon in practice, at least in my field.

      Comment

      Working...
      X