Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • import all files from a folder and generate a variable based on each filename

    Hi, I have a folder (D:\myname\Data\) containing files for each year, file1990.csv, file1991.csv, ...file2015.csv. I want to import them all at once and create a year variable based on the file name, so that for file1990.csv, year=1990. I have a following codes that work for importing all files, but I don't know how to create a year variable. Could anyone please help me? Thank you!

    cd "D:\myname\Data"
    clear
    local allfiles: dir . files "*.csv"

    foreach file of local allfiles {
    preserve
    import delimited using `file', clear
    save temp, replace
    restore
    append using temp, force
    }

  • #2
    There are cleaner ways to do what you're doing, but since you have working code, I just added the little piece you wanted, rather than re-write the whole thing.

    Code:
    cd "D:\myname\Data"
    clear
    local allfiles: dir . files "*.csv"
    
    foreach file of local allfiles {
    preserve
    import delimited using `file', clear
    
    *===============added line
    gen year=substr(`file',1,length(var1)-4)
    *===============
    
    save temp, replace
    restore
    append using temp, force
    }

    Comment


    • #3
      Hi Ben. Thanks for code. Could you please explain the added line a bit? What does var1 stand for? Would `file' get the file or the filename?

      Comment


      • #4
        Also, if you have a cleaner way to do this, please share the codes. I would love to learn it. Thanks.

        Comment


        • #5
          Oh, shoot. I left "var1" in there by accident left over from my testing. Yes, as you probably have figured out by now, it should be `file'. The substr and length functions chop off the ".csv" from the end of the filename, which should leave the year. Sorry for the confusion.

          And the rest of your code is fine. Some people like to avoid preserve/restore, since they can be slow with really large datasets, but for anything reasonable-sized, what you have is great.

          Comment


          • #6
            oh, and now I realized it's not 1990.csv, 1991.csv, but rather, file1990.csv, file 1991.csv. If which case, to strip off the first four characters, use:

            Code:
            gen year=substr(substr(`file',1,length(`file')-4),5,.)
            or shorter:
            Code:
            gen year=substr(`file',5,length(`file')-8)
            Last edited by ben earnhart; 04 Jan 2016, 19:41.

            Comment


            • #7
              Thanks Ben! The codes worked! I would only add that I used "`file'" to extract the year from filename.

              Comment

              Working...
              X