Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Looping over datasets and creating year for each dta.

    Hi everyone! I'm trying to create a year variable for each .dta file that I have. Thankfully, the .dta files themselves indicate the year, but I'm failing to develop a foreach command which takes advantage of this.

    My data is structured in the following manner:
    "A 1986 B C D.dta"
    "A 1988 B C D.dta"
    "A 1993 B C D.dta"
    "A 2001 B C D.dta"
    "A 2006 B C D.dta"
    "A 2015 B C D.dta"

    Here's the command I tried:
    Code:
    local filelist: dir . files "A `x' B C D.dta"
    foreach file of local filelist {
        use "file", clear
        generate year = `x'
        save, replace
        }
    After creating a year variable for each dta., I intend on appending these datasets.

    I hope that was a clear description of my problem. Thank you very much.
    Last edited by Kai Shen Lim; 17 Aug 2015, 12:03.

  • #2
    Not very clear, but here is what I understand you need:

    Code:
    forvalues y=1986/1987 {
        use "A `y' B C D.dta", clear
        generate int year = `y'
        // do something with the resulting file, perhaps save
    }

    Comment


    • #3
      Hi Sergiy, firstly, thank you for taking the time to take a look at my question! I changed the wording of my post, perhaps it is clearer now?

      Unfortunately my dta files are not evenly distributed in terms of year and have several gaps of years between them.

      Thank you once again

      Comment


      • #4
        Version with gaps:

        Code:
        foreach y in 1986 1988 1993 2001 2006 2015 {
            display "`y'"
            use "A `y' B C D.dta", clear
            generate int year = `y'
            // do something with the resulting file, perhaps save
        }
        What you wanted to do with a dir function is also possible, but I avoid that unless justified.

        Comment


        • #5
          Thank you very much! This solves my problem. However, for the sake of discussion, what would you do if you had 60 years, with gaps? I'm wondering if there would be a more efficient way of coding this?

          Nevertheless Sergiy, thank you very much.

          Comment


          • #6
            Fundamentally, I see two possibilities:
            1. you know how many years are there in your data and what they are;
            2. you don't (very generic program of the type find everything/process everything)
            In case of 1 I recommend proceed as I've shown, in case of 2 you proceed with dir.
            You are lucky there is some consistency in file names over 60 years, in many cases we don't have that luxury.

            Best, Sergiy

            Comment


            • #7
              Here's a general approach to these types of problems. It uses filelist (from SSC) and the code follows closely the example in the help file. The only trick here is how to extract the year from the file names. The example assumes the structure presented in the first post. There are many ways to do it, substr() and regex functions are also good choices. The example assumes that all the datasets are within a directory called "mydata" within Stata's current directory (help cd).

              Code:
              * -filelist- is from SSC
              filelist , dir("mydata") pattern(*.dta)
              
              * extract the year from the filename; many many ways of doing this
              split filename
              rename filename2 year
              
              tempfile myfiles
              save "`myfiles'"
              
              * loop over the list of files, create the year variable and save
              * a temporary copy of each dataset
              local obs = _N
              forvalues i=1/`obs' {
                 use "`myfiles'" in `i', clear
                 local f = dirname + "/" + filename
                 local fyear = year
                 use "`f'", clear
                 gen year = "`fyear'"
                 tempfile save`i'
                 save "`save`i''"
               }
              
              * combine all the temp files
              clear
              forvalues i=1/`obs' {
                 append using "`save`i''"
              }

              Comment


              • #8
                Thank you Robert, that was a very lucid description of a fairly complex (atleast for me) process.

                Comment

                Working...
                X