Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating filename as columns in multiple files dta files before appending

    Hi,

    I am trying to include the name of the file as a variable before appending. Please see code below, which works well.

    ideally the variable should be created after importing the delimited file, but before saving as a dta file for all files to be appended

    Thanks


    clear
    cd "C:\Users\aaaa"

    local files: dir "C:\Users\aaaa" files "*.txt"
    foreach file in `files' {
    clear
    import delimited `file'
    save `file'.dta, replace
    }



    local files: dir "C:\Users\aaaa " files "*.dta"

    foreach file in `files' {
    append using `file'
    }

  • #2
    Code:
    clear
    cd "C:\Users\aaaa"
    
    local files: dir "C:\Users\aaaa" files "*.txt"
    foreach file in `files' {
    clear
    import delimited `file'
    save `file'.dta, replace
    }
    
    
    
    local files: dir "C:\Users\aaaa " files "*.dta"
    clear
    gen filename = ""
    
    foreach file in `files' {
        append using `file'
        replace filename = "`file'" if missing(filename)
    }
    Note: Before you attempt this, you should check carefully to make sure that the files you are trying to append are actually compatible with each other. It is seldom the case that a large collection of files will be; usually one has to do some pre-cleaning to harmonize the data storage types, data labels, etc. To simplify the task, I recommend Mark Chatfield's -precombine- command, available from SSC. It will alert you to incompatibilities that will cause your -append- command to break with error messages or, worse, proceed to combine data from files with conflicting value labels, resulting, without warning, in a garbage data set. Then you can fix them to avoid trouble.

    Comment


    • #3
      Thanks you very much for your prompt response. I really appreciate it.

      When I execute the command I get the error "variable filename not found".

      Please take a look below.

      Thanks


      . clear

      . cd "C:\Users\aaaa"
      C:\Users\aaaa
      .
      . local files: dir "C:\Users\aaaa" files "*.txt"

      . foreach file in `files' {
      2. clear
      3. import delimited `file'
      4. save `file'.dta, replace
      5. }
      (242 vars, 4,757 obs)
      (note: file hello.txt.dta not found)
      file hello.txt.dta saved
      (54 vars, 4,757 obs)
      (note: file hhh.txt.dta not found)
      file hhh.txt.dta saved

      .
      .
      .
      . local files: dir "C:\Users\aaaa" files "*.dta"

      .
      . foreach file in `files' {
      2. append using `file'
      3. replace filename = "`file'" if missing(filename)
      4. }
      variable filename not found
      r(111);

      end of do-file

      r(111);

      Comment


      • #4
        Yes, Go back and use the code I suggested in #2. You skipped two very important lines, even though they were in bold face.

        Comment


        • #5
          Thank you very much. I don't how i missed that line. Works well now.

          I will be paying attention to the structure as you suggested.

          Very grateful sir. Enjoy your day.

          Comment


          • #6
            I'm a bit confused about why you want things in this order; note that the -append- command has an option "gen(newvar)" that basically does this for you - this appears to be much simpler and gets you to the same place; see
            Code:
            h append

            Comment


            • #7
              The problem with -append-'s -gen- option is that the variable it creates is just a numerical sequence starting from 0. To make it useful, you then have to create a value label showing the actual filenames that correspond. To me, at least, that's a bigger nuisance than the approach in #2.

              Comment


              • #8
                Well, actually, I prefer the use of the "gen()" option because I run things like this (complicated data issues) from "do" files and I can (1) put a "label define" command and a "label val" command in the do file and (2) no matter how hard I try, the comments I add to the do files are, generally, insufficient when I come back to the do file months, or even years, later and so I want things to be very clear - and using "label" and the "gen()" option is, to me, much clearer then, e.g., the code shown in #2 above

                Comment

                Working...
                X