Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Combine many .csv files

    Hello all,
    I want to combine many .csv files. Each file has the name NYSEyyyymmdd.csv, where yyyymmdd is the date. For example yyyy= 1996, mm is month, e.g 01 for january, and dd is day, e.g. 01 for the first day of the month. The data are stock prices each day when the exchange is open, so even though the file names increases, there will be gaps for weekends, holidays, etc.
    The following code does not work due to the gaps in file names:


    local j = 1
    forval i = 1/50 {
    clear

    insheet using "D:\NYSE\NYSE_1996010`i'.csv"
    gen id = "`i'"
    save file`j++', replace

    }

    local J = `j' - 2
    forval j = `J'(-1)1 {
    append using fole`j'
    }

    For example, this code then tries to import the file NYSE_19960106.csv, but because that was a Saturday, the file does not exist and Stata stops reading the code when a file is not found.

    How can I make Stata continue reading even if one file is not in the folder?



    Suggestions are appreciated!

    Thomas Leirvik


    Thomas Leirvik

  • #2
    Thomas: You have supplied a signature, which is fine, but note that if you forget to do that, then your identifier "Leir Th" is less than a full real name, so you may get asked to change it. In fact re-registering now as "Thomas Leirvik" would solve all such problems. See FAQ Advice #6.

    Your code is more problematic than you say as 41 of your calls substituting 10(1)50

    Code:
    insheet using "D:\NYSE\NYSE_199601010.csv
    to

    Code:
    insheet using "D:\NYSE\NYSE_199601050.csv
    wouldn't match any files, regardless of the working day problem.

    A better strategy is to loop over the files that exist. Also, make your identifier variable meaningful.

    This code uses fs which must be installed before use:

    Code:
    ssc inst fs 
    
    cd D:\NYSE 
    fs NYSE_*.csv 
    
    foreach f in `r(files)' { 
    
        insheet using "`f'", clear 
        local ID : subinstr local f "NYSE_" "", all 
        gen id = "`ID'"
        save file`i', replace
    
    }

    Comment


    • #3
      The local macro, i, in the last command -save file`i'- is not defined, so this will end up repeatedly overwriting a single file named file.dta. I think Nick meant something like -save file`ID', replace-.

      Comment


      • #4
        Quite so. Thanks for the catch.

        Comment


        • #5
          I have a similar issue.
          I renamed 23 irregularly named files as file1 file2 etc. using a windows trick described in http://www.stata.com/statalist/archi.../msg00896.html
          and a couple of !rename subinstr commands described in http://www.stata.com/statalist/archi.../msg00295.html.
          Now I want to append them to each other, starting with yet another file which I renamed file0 So I tried:

          forval n = 1/23 {
          use file`n', clear
          append using file`n-1', force
          save file`n', replace
          }

          which runs, but yields the same 23 files, with duplicates (ie, the original files appended to themselves)
          What am I doing wrong?

          Thanks!

          Comment


          • #6
            I highly recommend using filelist (from SSC) to create a dataset of files names you want to work with. You can then use plain ol' Stata data management techniques to prune down the list of files and adjust file names as needed using string functions. With filelist, there's no need to use operating system tricks that can't be replicated or to shell out to perform file name manipulations, you can do it all in Stata.

            Having said that, if the task is to append 24 files named "file0.dta" to "file23.dta", then the following should work:

            Code:
            * start with an empty dataset in memory
            clear
            
            * loop over each file to append, never use the -force- option in a loop
            forvalues n = 0/23 {
                append using "file`n'.dta"
            }
            
            * now you save save the combined data
            save "data_combo.dta"
            You should never use the force option with append, it will result in data loss. You must first fix the variable type incompatibility before trying to append the data. If the data was imported from a text file, you can use the stringcols(_all)option for import delimited. If the data was imported from Excel, use the allstring option for import excel. Once all the data are imported into string variables, you can append all datasets and only then destring variables using:

            Code:
            destring, replace
            See an example of the whole process here.

            Comment


            • #7
              Thanks so much! Solved.
              The force option in the loop was not the problem, the non-blank file and the saving apparently were. I suppose I just did not understand how the append command operates well enough to use it in a loop. I also learned of fs and filelist user commands, which I will use henceforth to rename etc.
              Thanks!

              Comment


              • #8
                Originally posted by Nick Cox View Post
                A better strategy is to loop over the files that exist. Also, make your identifier variable meaningful.

                This code uses fs which must be installed before use:

                Code:
                ssc inst fs
                
                cd D:\NYSE
                fs NYSE_*.csv
                
                foreach f in `r(files)' {
                
                insheet using "`f'", clear
                local ID : subinstr local f "NYSE_" "", all
                gen id = "`ID'"
                save file`i', replace
                
                }
                Dear Nick,

                Suppose that I have three (many) CSV files in the "E:\Stata\manycsv" directory as shown below. How can I modify your procedure above to import many CSV files in a directory into Stata? Note that the filenames are not in any particular order.
                Code:
                webuse grunfeld, clear
                
                preserve
                keep if company == 1
                export delimited using "E:\Stata\manycsv\a1.csv", replace
                restore
                
                preserve
                keep if company == 2
                export delimited using "E:\Stata\manycsv\b2.csv", replace
                restore
                
                preserve
                keep if company == 3
                export delimited using "E:\Stata\manycsv\c3.csv", replace
                restore
                Ho-Chuan (River) Huang
                Stata 17.0, MP(4)

                Comment

                Working...
                X