Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Adding file names and folder names as variables then appending multiple csv files

    Hello Everyone,

    I hope you are well!

    I have multiple .csv files (from multiple folders) who are missing a couple of different variables. Information on these variables are contained in file names and folder names. After I generate the variables that I need I want to append all these .csv files and make sure the additional variables are merged to them.

    I followed the solutions that Robert Picard mentioned on the following post

    Below is the code that I used:

    Part 1 of my code runs perfectly and creates exactly what I expected (note you can find filelist on ssc). I was able to generate the variables that I needed

    but when I get to part 2 Stata shows me invalid syntax r(198)

    I've tried to identify my mistake but haven't succeeded. Do you folks have any suggestions? I don't entirely understand each section in part 2 codes. I tried to follow/ adapt Robert's code from the post mentioned above. Note: the first row that I want to keep in .csv files is row 11 and it contains variable names.


    Code:
    **PART 1
    pwd // Shows the current directory
    
    cd "/Users/Patrizio/Desktop/AccelerometerData/" //specifying new directory
    
    filelist , dir(.) pattern(*5_15sec.csv) //getting files that i need for 15sec epoch
    
    gen student_id=substr(filename,1,8) // generating student id from file name taking the first 8 characters
    
    gen grade_pa=regexs(0) if regexm(dirname,"[0-9]") //generating grade variable by taking the number included in the directory
    
    destring grade_pa, replace //destringing grade variable
    
    *saving a a temporary file
    tempfile hpeacc
    save "`hpeacc'"
    
    *trying all this on all file
    local obs=_N
    *directory for stata file
    cap mkdir hpeacc_stata
    
    **PART 2
    forvalues i=1/`obs' {
    
        use "`hpeacc'" in `i', clear
        local source = student_id
        local f = dirname + "/" + filename
        
        clear
        import delimited using "`f'", varnames(11) stringcols(_all)
        gen source = "`source'"
        
        tempfile save`i'
        qui save "`save`i''"
        
    }
    Code:
     forvalues i=1/`obs' {
      2. 
    .     use "`hpeacc'" in `i', clear
      3.     local source = student_id
      4.     local f = dirname + "/" + filename
      5.     
    .     clear
      6.     import delimited using "`f'", varnames(11) stringcols(_all)
      7.     gen source = "`source'"
      8.     
    .     tempfile save`i'
      9.     qui save "`save`i''"
     10.     
    . }
    invalid syntax
    r(198);
    
    end of do-file
    
    r(198);


  • #2
    You need to run both parts of the code together (since `obs' is a local). Please retry and let us know if Stata still throws up an error.

    Comment


    • #3
      There's nothing wrong with the code you posted and it works fine with a set of toy cvs files I put together. My guess is that you are running this from the editor in parts. Since you are defining local macros, the whole code has to run at the same time.

      For those who come across this post and are a bit puzzled as to where this is going, it's missing a third part where all the files are combined:
      Code:
      clear
      forvalues i=1/`obs' {
          append using "`save`i''"
      }

      Comment


      • #4
        Okay this makes way more sense now thank you. Both of you were right (running the below code at once) works perfectly.

        One more question though
        Code:
        pwd // Shows the current directory
        cd "/Users/Patrizio/Desktop/AccelerometerData/" //specifying new directory
        filelist , dir(.) pattern(*5_15sec.csv) //getting files that i need for 15sec epoch
        gen student_id=substr(filename,1,8) // generating student id from file name taking the first 8 characters
        gen grade_pa=regexs(0) if regexm(dirname,"[0-9]") //generating grade variable by taking the number included in the directory
        destring grade_pa, replace //destringing grade variable
        *saving a a temporary file
        tempfile hpeacc
        save "`hpeacc'"
        *trying all this on all file
        local obs=_N
        *directory for stata file
        cap mkdir hpeacc_stata
        forvalues i=1/`obs' {
        
            use "`hpeacc'" in `i', clear
            local source = student_id
            local source1 = grade_pa
            local f = dirname + "/" + filename
            
            clear
            import delimited using "`f'", varnames(11) stringcols(_all)
            gen student_id = "`source'"
            gen grade_pa = "`source1'"
            
            tempfile save`i'
            qui save "`save`i''"
            
        }
        
        clear
        forvalues i=1/`obs' {
            append using "`save`i''"
        }
        destring, replace

        Comment


        • #5
          okay one more question.
          Can filelist identify .csv with different specified names within the same line of code? In my code, I identified all the file names that ended with 5_15sec, I want to also identify 6_15sec, 10_15sec and three more. If I only use .csv the code will identify 4 times as many .csv files that I don't need.

          I am referring to the following line
          Code:
           
           filelist , dir(.) pattern(*5_15sec.csv)

          Comment


          • #6
            If you can't come up with a single pattern to pick-up all the files you are targeting without undesirable matches, then just find all csv files and prune the list down using standard Stata data management techniques. Something like:

            Code:
            filelist , dir(.) pattern(*.csv)
            drop if strpos(filename, "not_this_one")
            drop if strpos(dirname, "testing")
            gen tokeep = 0
            replace tokeep = 1 if strpos(filename, "5_15sec.csv")
            replace tokeep = 1 if strpos(filename, "6_15sec.csv")
            replace tokeep = 1 if strpos(filename, "10_15sec.csv")
            keep if tokeep

            Comment


            • #7
              Thank you Robert!

              Comment

              Working...
              X