Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Use loop macro to edit, convert and append files

    I have about 600 csv files. I would like to drop some variables & observations in those files, then convert them into Stata format, and eventually combine (append) them into one big file (hopefully the size is not too big. Maybe around 5 or 6GB).

    Specifically, I run the following codes for the first file and would like to repeat such procedures for all the other files:


    import delimited "E:\Research\0001.csv", encoding(UTF-8)
    drop V1 V2 V3
    keep if country=="United States"
    save "E:\Research\0001_n.dta"

    After converting them into Stata files, I would like to combine(append) them into one single file. Does anyone know how to do it? Any help or guidance will be greatly appreciated!! Thank you in advance.


    BTW, when I tried to import this csv file into Stata, it shows
    "Note: Unmatched quote while processing row 605334; this can be due to a formatting problem in the file or because a quoted data element spans multiple lines. You should carefully inspect your data after importing. Consider using option bindquote(strict) if quoted data spans multiple lines or option bindquote(nobind) if quotes are not used for binding data".

    But when I looked at the data, it looks okay. Should I do anything about it?

  • #2
    I have performed something like this before and here are my codes:

    Code:
    global csvfolder "/myfolder_with_all_csv"
    
    * loop for all files & xform to dta
    local files: dir "$csvfolder" files "*.csv"
    foreach file of local files {
        local pref = substr("`file'", 1, 1)
        if "`pref'" != "." {
           dis "Working on `file'"
           import delimited "$csvfolder/`file'", encoding(UTF-8)
          
           // add your codes here
          
           local newfilename = subinstr("`file'", ".csv", "",.)
           save "$csvfolder/`newfilename'.dta", replace
        }
    }
    
    * append all dta files
    use "$csvfolder/0001_n.dta", clear
        local files: dir "$csvfolder" files "*.dta"
        foreach file of local files {
            if "`file'" != "0001_n.dta" {
                local pref = substr("`file'", 1, 1)
                if "`pref'" != "." {
                   dis "Appending `file'"
                   qui append using `file'
                }
            }
        }
    save "all.dta", replace
    For the warnings of the -import- command, it also occurs to me when your csv files have many string variables or long strings included. It should be okay if you have cautiously checked the imported files are actually correct.

    Comment


    • #3
      Have you tried to look at the csv file in a plain text editor? Could you copy and paste line 605534 (the one which is giving an error) here so we can examine it? Please paste it from the text editor into this space using CODE delimiters (the # button on the edit toolbar)

      Comment


      • #4
        Originally posted by Xinya Hao View Post
        I have performed something like this before and here are my codes:

        Code:
        global csvfolder "/myfolder_with_all_csv"
        
        * loop for all files & xform to dta
        local files: dir "$csvfolder" files "*.csv"
        foreach file of local files {
        local pref = substr("`file'", 1, 1)
        if "`pref'" != "." {
        dis "Working on `file'"
        import delimited "$csvfolder/`file'", encoding(UTF-8)
        
        // add your codes here
        
        local newfilename = subinstr("`file'", ".csv", "",.)
        save "$csvfolder/`newfilename'.dta", replace
        }
        }
        
        * append all dta files
        use "$csvfolder/0001_n.dta", clear
        local files: dir "$csvfolder" files "*.dta"
        foreach file of local files {
        if "`file'" != "0001_n.dta" {
        local pref = substr("`file'", 1, 1)
        if "`pref'" != "." {
        dis "Appending `file'"
        qui append using `file'
        }
        }
        }
        save "all.dta", replace
        For the warnings of the -import- command, it also occurs to me when your csv files have many string variables or long strings included. It should be okay if you have cautiously checked the imported files are actually correct.

        Thanks a lot for the suggestion! I tried to modify and run the codes (I realized that I have to check the validity of those files first and want to only process the files with correct content), however it returned with an error r(133) unknown function varname(). Please see my codes below. I am not sure where I did wrong.

        global csvfolder "E:\Research\sample"
        local files: dir "$csvfolder" files "*.csv"
        foreach file of local files {
        import delimited "$csvfolder/`file'", encoding(UTF-8) rowrange(1:1)
        local firstvar = varname(1)
        local secondvar = varname(2)
        if "`firstvar'" == "user_id" & "`secondvar'" == "position_id" {
        import delimited "$csvfolder/`file'", encoding(UTF-8)
        drop V1 V2 V3
        keep if country=="United States"
        local newfilename = subinstr("`file'", ".csv", "",.)
        save "$csvfolder/`newfilename'.dta", replace
        }
        else {
        di "Skipping file `file' - does not meet variable naming requirements."
        }
        }

        Comment


        • #5
          Originally posted by Xinya Hao View Post
          I have performed something like this before and here are my codes:

          Code:
          global csvfolder "/myfolder_with_all_csv"
          
          * loop for all files & xform to dta
          local files: dir "$csvfolder" files "*.csv"
          foreach file of local files {
          local pref = substr("`file'", 1, 1)
          if "`pref'" != "." {
          dis "Working on `file'"
          import delimited "$csvfolder/`file'", encoding(UTF-8)
          
          // add your codes here
          
          local newfilename = subinstr("`file'", ".csv", "",.)
          save "$csvfolder/`newfilename'.dta", replace
          }
          }
          
          * append all dta files
          use "$csvfolder/0001_n.dta", clear
          local files: dir "$csvfolder" files "*.dta"
          foreach file of local files {
          if "`file'" != "0001_n.dta" {
          local pref = substr("`file'", 1, 1)
          if "`pref'" != "." {
          dis "Appending `file'"
          qui append using `file'
          }
          }
          }
          save "all.dta", replace
          For the warnings of the -import- command, it also occurs to me when your csv files have many string variables or long strings included. It should be okay if you have cautiously checked the imported files are actually correct.

          Also, when I run the codes you provided (without modification), it returned an error r(4) data in memory would be lost. It can only successfully convert one csv file to Stata file.

          Comment


          • #6
            Originally posted by Helen Chang View Post


            Also, when I run the codes you provided (without modification), it returned an error r(4) data in memory would be lost. It can only successfully convert one csv file to Stata file.
            You will need to use -import , clear- option to allow the new import to replace the current data in memory.
            Code:
            import delimited ........ , clear

            Comment

            Working...
            X