Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Looping a command over specific files within a folder

    Hello all,

    I need to manipulate the date variable in a few specific .dta files within a folder, and they are ordered by year. All of the file names begin with "extract_" and are followed by the respective year.

    (e.g., extract_2010
    extract_2011
    extract_2012, etc.)

    What I would like to do is run a loop of commands for some of those files, and a slightly different one for the rest. The code I am using currently grabs all of the files in the folder, but I would actually like to run the code pasted below on only the first 7 years (2010-2016) .

    I know that I can select files with certain prefixes by entering the beginning portion of the filename when I specify which files my local dataset should contain (for instance "extract_*.dta"). I'm wondering if I can modify this in some way to only place the first 7 files (2010-2016) in the local dataset?

    Please see below.


    cd "E:\file_location"

    local datasets : dir . files "*.dta"

    foreach file of local datasets {
    use "`file'", clear
    tostring incdate, gen (date)
    gen date2 = date(date, "YMD")
    format %tdnn/dd/CCYY date2
    drop date incdate
    rename date2 (incident_date)
    order incident_date, after (incnum)
    egen unique_incident_id = concat(ori incnum), punct(" ")
    order unique_incident_id, after (incnum)
    save "`file'", replace
    }

    Thank you kindly for your time and attention to this question.
    Last edited by Danye Medhin; 15 Mar 2023, 08:15.

  • #2
    Code:
    local datasets
    forvalues y = 2010/2016 {
        local datasets `datasets' extract_`y'
    }

    Comment


    • #3
      This is fine, but, why make the date into a string, and then make a new date variable? If it's already in number form, it's then just a matter of reformatting it until human readable form.

      Either way, I don't recommend saving the file again if you don't have access to its original form. i would likely append _new at the end, just to know that I'm not overwriting the original file.

      Comment


      • #4
        I should also mention that we could do more with this, depending on how wild you wanna go with your code. You write
        What I would like to do is run a loop of commands for some of those files, and a slightly different one for the rest.
        And that's fine. What you could also do, is add some programmers if commands so you don't need to do multiple loops. I'm not at my computer, so I'll write it in pseudocode

        Code:
        foreach x of num 1(1)20 {
        cap as x < 10
        if _rc!=0 {
        do x y x
        }
        else if _rc==0 {
        do x y
        }
        }
        Again, improperly formatted, but that's the idea. It's what I do when I'm forced to work with URLs/datasets of only slightly different flavors.

        Comment


        • #5
          Thank you Clyde and Jared for your timely and helpful responses!

          Comment


          • #6
            Jared, to your question, when I simply reformat the variable, nothing happens. I'm sure there's just something I'm paying attention to, but I seem to only be able to reformat the date by adding these seemingly unnecessary steps.

            Comment


            • #7
              Use dataex to show us your original data. Only then can I answer how to do what you seek.

              Comment


              • #8
                To give a full worked example of this, I create 3 csv datasets. Note that if you have any datasets dear to you named "testfileloop..." then you may wanna rename them. Unlikely, but either way, here's how to do as you seek.
                Code:
                cap forv i = 1/3 {
                
                erase testfileloop`i'.csv
                }
                cls
                forv i = 1/3 {
                clear *
                
                set obs 5
                
                g x = `i'
                
                export delimited using "testfileloop`i'", replace
                }
                
                local datasets
                forv i = 2/3 {
                    local datasets `datasets' testfileloop`i'
                }
                
                cls
                foreach file of loc datasets {
                import delim "`file'", clear
                g y = x+3
                
                save "`file'", replace
                }
                cls
                clear
                ap using `datasets'
                
                foreach file of loc datasets {
                    erase `file'.csv
                }
                Here, I arbitrarily make up a short dataset and do some addition to generate another variable in only datasets 2 and 3 (we could do this with 100 if we had to). Then, I append these into one file, and get rid of the original ones. Whether you should erase them is another matter, but that's how you'd do this in context. In fact, just to give a real world example of when we'd want this. Say we want the total quarterly number of COVID-19 deaths in 2020 and 2021 for Gwinnett County, GA. We could do this
                Code:
                clear *
                
                forv x = 2020/2023 {
                    
                mkf gwinett`x'
                
                
                cwf gwinett`x'
                
                
                import delim https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties-`x'.csv, clear
                
                gen date2 = date(date, "YMD")
                
                keep if fips ==13135
                
                gen qdate = qofd(date2)
                format qdate %tq
                
                keep qdate deaths county
                
                collapse (max) deaths, by(qdate county)
                
                frame gwinett`x': tempfile gwinett`x'
                frame gwinett`x': save `gwinett`x''
                }
                
                local datasets
                forvalues y = 2020/2021 {
                    local datasets `datasets' `gwinett`y''
                }
                
                mkf appended
                
                cwf appended
                
                frame appended: append using `datasets'
                
                qui frames dir
                
                loc framelist `r(frames)'
                
                foreach x of loc framelist {
                cap as strpos("`x'", "gwinett") == 1
                
                    if !_rc {
                        
                    frame drop `x'    
                    }
                }
                frame drop default
                br
                We have data for all three years, but, we only need the first two in this example. So, we loop over the frames that have what we need. We get rid of the ones we don't, and boom boom, we have a quarterly dataset we can work with if we needed to.

                Comment

                Working...
                X