Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using strings in a Stata dataset as filenames

    I have a dataset with 1 column. This one column contains file names with their proper directory prefix like so (I am only including the first few elements) -

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str215 V1
    "C:/Users/nilim/Csv Files/file_04_06_on_2020_12_09/file_04_06_on_2020_05_31.csv"
    "C:/Users/nilim/Csv Files/file_04_07_on_2020_12_09/file_04_07_on_2020_05_31.csv"
    "C:/Users/nilim/Csv Files/file_04_08_on_2020_12_09/file_04_08_on_2020_05_31.csv"
    "C:/Users/nilim/Csv Files/file_04_09_on_2020_12_09/file_04_09_on_2020_05_31.csv"
    "C:/Users/nilim/Csv Files/file_04_10_on_2020_12_09/file_04_10_on_2020_05_31.csv"
    "C:/Users/nilim/Csv Files/file_04_11_on_2020_12_09/file_04_11_on_2020_05_31.csv"
    "C:/Users/nilim/Csv Files/file_04_12_on_2020_12_09/file_04_12_on_2020_05_31.csv"
    "C:/Users/nilim/Csv Files/file_04_13_on_2020_12_09/file_04_13_on_2020_05_31.csv"
    "C:/Users/nilim/Csv Files/file_04_14_on_2020_12_09/file_04_14_on_2020_05_31.csv"
    "C:/Users/nilim/Csv Files/file_04_15_on_2020_12_09/file_04_15_on_2020_05_31.csv"
    "C:/Users/nilim/Csv Files/file_04_16_on_2020_12_09/file_04_16_on_2020_05_31.csv"
    "C:/Users/nilim/Csv Files/file_04_17_on_2020_12_09/file_04_17_on_2020_05_31.csv"
    "C:/Users/nilim/Csv Files/file_04_18_on_2020_12_09/file_04_18_on_2020_05_31.csv"
    "C:/Users/nilim/Csv Files/file_04_19_on_2020_12_09/file_04_19_on_2020_05_31.csv"
    end
    I want to open each file whose name is one datapoint in this dataset and save it as .dta. This is the code I have tried to write till now.

    Code:
    use filenames.dta, clear
    
    forval i = 1/`=_N' {
        use filenames.dta, clear
        local bla = cond(_n==`i', V1, .)
        import delimited `bla', clear (??????)
        save ???????
    }
    The code above has 2 places where there are ????.
    1. The first ???? - local bla = cond(_n==`i', V1, .) & import delimited `bla', clear commands do not seem to be working/
    2. The second ???? - I want to save the files with the same name as they are saved in the csv. For example - the file "C:/Users/nilim/Csv Files/file_04_06_on_2020_12_09/file_04_06_on_2020_05_31.csv" should be saved as "C:/Users/nilim/Csv Files/file_04_06_on_2020_12_09/file_04_06_on_2020_05_31.dta"

    Even if someone can help with only a portion of this problem I'll be happy to take it. I have been stuck at this problem for quite some time and I can't seem to find a workaround.

    P.S. I tried doing this in R - which handles datapoints 'individually' within a set much simply but some columns in these csvs have too many characters and R can't open them. (Stata 17 / SE (the version I am on) can - I checked for a few csvs without using the loop)

    Do help
    Last edited by Nilima Pisharody; 02 Jun 2021, 03:01. Reason: File Save

  • #2
    How about this:

    Code:
    forv i = 6/19 {
        local j : display %02.0f `i'
        import delimited "C:/Users/nilim/Csv Files/file_04_`j'_on_2020_12_09/file_04_`j'_on_2020_05_31.csv", clear 
        save "C:/Users/nilim/Csv Files/file_04_`j'_on_2020_12_09/file_04_`j'_on_2020_05_31.dta", replace
    }

    Comment


    • #3
      You could try something along the lines of:
      Code:
      use filenames.dta, clear
      
      forval i = 1/`=_N' {
          use filenames.dta, clear
          local bla = V1[`i'] // Access the i-th observation in variable V1
          import delimited `bla', clear
          save `=subinstr("`bla'","csv","dta",1)' // Replace "csv" with "dta" in the string and use the resulting string as new filename.
      }
      This code is not fully tested, but it should give an idea where to go from there

      Comment


      • #4
        Scott Merryman While I said that I have only showed the first few elements in my dataset, I probably should have clarified that in my full list, the names have a pattern - file_X_on_Y.csv. Both X and Y change. X and Y are both dates. And I have daily data for multiple years.

        In fact, your solution was one of the first things I tried myself - using forv 1/31; forv 1/30; forv 1/28 and forv 1/29 (dates) but then I had to also include forv 1/12 (months) and forv 2000/2021 (years) and managing all the if conditions for months (with their corresponding dates) and all the loops was getting a bit crazy for me. The major problem with running these loops is not even their cumbersome nature but the fact that they are non scalable to other filenames.

        I hoped (still am hoping) that Stata can open strings passed to it via a local variable, if it is a valid path - which is why I wrote code as given in the Q.

        Comment


        • #5
          Code:
          putmata files = V1  
          
          local nfiles = _N 
          
          forvalues i = 1/`nfiles' { 
          
              mata : st_local("fn", files[`i''] )
          
              di "`fn'" // any code using "`fn'" 
          }

          Comment


          • #6
            The basic question here is "How do I execute data?" Because Stata is interpreted, this is fairly straightforward. You could use -file write- and -include- but a better way is to define a macro in terms of data, for example -pwd- is a Stata command that can be executed from a string variable:
            Code:
            set obs 1
            gen c="pwd"
            local m=c[1]
            `m'
            will Print Working Directory.. I won't present a complete solution.

            Comment


            • #7
              If you are on Stata 17 it seems to me that using frames solves your problem. Here is (untested) code that may point in a useful direction.
              Code:
              frame create work
              use filenames.dta, clear
              
              forvalues i = 1/`=_N' {
                  local csv = V1[`i']
                  local dta : subinstr local csv ".csv" ".dta"
                  frame work {
                      display `"import delimited `"`csv'"', clear"'
                      display `"save `"`dta'"', replace"'
                  }
              }
              When run, this executes commands like the following
              Code:
              import delimited `"C:/Users/nilim/Csv Files/file_04_06_on_2020_12_09/file_04_06_on_2020_05_31.csv"', clear
              save `"C:/Users/nilim/Csv Files/file_04_06_on_2020_12_09/file_04_06_on_2020_05_31.dta"', replace
              import delimited `"C:/Users/nilim/Csv Files/file_04_07_on_2020_12_09/file_04_07_on_2020_05_31.csv"', clear
              save `"C:/Users/nilim/Csv Files/file_04_07_on_2020_12_09/file_04_07_on_2020_05_31.dta"', replace
              Last edited by William Lisowski; 02 Jun 2021, 10:55.

              Comment


              • #8
                William Lisowski I know it is a bit late, but thank you so much! editing your code to suit the problem worked for me.

                Comment

                Working...
                X