Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Looping over folders and files weird behaviour

    Hello everyone,

    I have a problem with my code and I don´t know how to solve it. The issue is that I have several folders and within each folder a file(s) that can be in different formats, actually either ".txt" or ".trs", for files in forlders 2004, 2009-2017 and from 2005-2008 respectively: for example:
    Main_path\
    MCVL2004
    MCVL2004_affiliation1.txt
    MCVL2004_affiliation2.txt
    MCVL2005
    MCVL2005_affiliation1.trs
    MCVL2005_affiliation2.trs
    .
    .
    .
    MCVL2009
    MCVL2009_affiliation1.txt
    MCVL2009_affiliation2.txt

    However, I am utterly puzzled by the fact that when running the code as is below, it would only read and save files with extension ".trs", but when I comment the second chunk and uncomment the third it would only read and save ".txt" files.
    Of course I would like to read and save all at once and not have to manually change the code. I have also tried to set a "else if" for folders from 2005 to 2008 but it didn´t work either. So, I would appreciate if somebody could point out what I am doing wrong and propose an alternative to make it run all straight through.

    Kind Regards


    Code:
    cd "${rawdata_path}"
    
    ***** ***** ***** ***** ***** DATA EXTRACTION ***** ***** ***** ***** ***** *****
    // BE CAREFUL!!! there are two else, comment one
    
    local folder: dir . dirs "*"
    foreach i of local folder {
    di "`i'"
        if         "`i'" == "MCVL2004" {
            local filename     : dir "${rawdata_path}/`i'" files "*.txt"
            foreach f in `filename'{
                di "`f'"
                cd "${rawdata_path}/`i'"
                import delimited "`f'", delimiter(space) bindquote(nobind) case(preserve) asfloat clear
                local name "`f'"
                local name = subinstr("`name'", ".txt", "", 1)
                save "${datasets_path}/`i'/`name'.dta", replace
            }
        }
        else {
            local filename : dir "${rawdata_path}/`i'" files "*.trs"
            foreach f in `filename'{
                di "`f'"
                cd "${rawdata_path}/`i'"
                import delimited "`f'", delimiter(";", asstring) bindquote(nobind) case(preserve) asfloat clear
                local name "`f'"
                local name = subinstr("`name'", ".trs", "", 1)
                save "${datasets_path}/`i'/`name'.dta", replace
            }
        }
        /*
        else {
            local filename     : dir "${rawdata_path}/`i'" files "*.txt"
            foreach f in `filename'{
                di "`f'"
                cd "${rawdata_path}/`i'"
                import delimited "`f'", delimiter(";", asstring) bindquote(nobind) case(preserve) asfloat clear
                local name "`f'"
                local name = subinstr("`name'", ".txt", "", 1)
                save "${datasets_path}/`i'/`name'.dta", replace
            }
        }
        */
    }
    Last edited by Ruben Perez; 17 Apr 2019, 09:14.

  • #2
    It isn't obvious to me why your code does not work. But let me just show you another way to import and save all of these files. The code is shorter and simpler, and the execution time may be noticeably faster if there are many of them.

    Code:
    cd "${rawdata_path}"
    
    clear
    
    tempfile txt
    filelist, pattern("*.txt") save(`txt')
    filelist, pattern("*.trs")
    append using `txt'
    
    capture program drop one_file
    program define one_file
        local dirname = dirname[1]
        local filename = filename[1]
        local filestub: subinstr local filename "*.txt" ""
        local filestub: subinstr local filename "*.trs" ""
        import delimited `"`dirname'/`filename'", delimiter(space) //
            bindquote(nobind) case(preserve) asfloat clear
        local dirname: subinstr local dirname "./" ""
        save `"${datasets_path}/`dirname'/`filestub'.dta"', replace
        exit
    end
    
    runby one_file, by(dirname filename) status
    -filelist- is by Robert Picard and is available from SSC. -runby- is by Robert Picard and me, also available from SSC.

    This code is not tested, and given the typographic complexity of the various paths and filenames, there may be errors in it. But this is the gist of it.

    I want to point out one potential glitch in both this code and your original code. If you have a file named ABC.txt and another named ABC.trs in the same directory, the code will send them both to the same destination: ABC.dta., which means that whichever gets processed last will survive, and the other will be lost. So you would be well advised to verify before proceeding that there are no such filename clashes.
    Last edited by Clyde Schechter; 17 Apr 2019, 21:16.

    Comment

    Working...
    X