Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using Stata to find a select list of PDF files in a folder, copy and place copies in a new folder

    Hi all,

    I'm using Stata 13.1. I have a folder with a large number of PDFs in them. I have a Stata file that contains the names of around 22 PDFs that I want to copy and save in a different directory which will only contain those 22 identified files. I've tried to do this in Stata, but I'm not having any joy, possibly because I'm trying to use 'save' where the PDFs are not a Stata format. Any help appreciated.

    This is my sample code:

    **Begin code**
    local os = "`c(os)' "
    display "<`os'>"

    if trim("`os'")=="Windows" {
    global basedir = "N:\Data"
    }
    else if trim("`os'")=="MacOSX" {
    global basedir = "/Volumes/data/"
    }


    global PDFs = "${basedir}PDFs"
    global WM_PDFs = "${basedir}WM"

    cd "$PDFs"
    cap nois: mkdir "WM"

    import excel "${basedir}Archive\WM_lookup.xlsx", sheet("Sheet1") firstrow clear
    keep if WM ==1

    g Name = CCG_Code+".pdf"

    levelsof Name, local(levels)
    foreach l of local levels {
    cd "$PDFs"
    copy `l'
    cd "$WM_PDFs"
    save `l', replace
    }

    **End Code**




  • #2
    Note that the copy call will fail before you even get to save. copy requires two filenames.

    I have formatted your code using CODE delimiters. Many people find this easier to read. Please follow suit in your threads. This is explained in the FAQ Advice. (Indenting within loops would help too.)

    I got a bit lost in the middle of your code so can't follow all that you are trying to do.

    Note that you don't need to change a directory to create a file in it. You just specify the directory.

    Code:
    **Begin code**
    local os = "`c(os)' "
    display "<`os'>"
    
    if trim("`os'")=="Windows" {
    global basedir = "N:\Data"
    }
    else if trim("`os'")=="MacOSX" {
    global basedir = "/Volumes/data/"
    }
    
    
    global PDFs = "${basedir}PDFs"
    global WM_PDFs = "${basedir}WM"
    
    cd "$PDFs"
    cap nois: mkdir "WM"
    
    import excel "${basedir}Archive\WM_lookup.xlsx", sheet("Sheet1") firstrow clear
    keep if WM ==1
    
    g Name = CCG_Code+".pdf"
    
    levelsof Name, local(levels)
    foreach l of local levels {
    cd "$PDFs"
    copy `l'
    cd "$WM_PDFs"
    save `l', replace
    }
    
    **End Code*

    Comment


    • #3
      Thanks Nick for your reply.

      Your amendments to my code with CODE delimiters is so subtle it escapes me...?

      I'll try again to explain.

      I have a folder of PDF files : "N:\PDFs"

      In this folder are 100's of PDFs, but I only want to make copies of 20~ PDFs in the PDF folder and save these in a new folder - "N:\PDFs\WM"

      I generate a Stata data file that contains 20~ rows of data that contain the name of each PDF I want to copy - with the .pdf file extension. I want to go through each row, take the filename, look in the "N:\PDFs" folder, find the relevant file, copy it and make a copy of this in "N:\PDFs\WM". I wish to do this for each row in the Stata data file. If I were using a csv, or .dta file, I could load into Stata and then save into the appropriate directory. As its a PDF I cant do that, so I need to find a way of copying the file and then saving without loading into Stata.

      I agree that in the loop I do not need to change directory each time. I've attempted to clean up my code as originally posted. Hope this explains what I am trying to achieve. I've just tried to indent my loops but they get 'corrected' back to no indenting.


      #delim ;

      import excel "N:\Archive\WM_lookup.xlsx", sheet("Sheet1") firstrow clear
      keep if WM ==1

      g Name = CCG_Code+".pdf"

      levelsof Name, local(levels)

      foreach l of local levels {
      cd N:\PDFs
      copy `l'
      save N:\PDFs\WM\`l', replace
      } ;

      Comment


      • #4
        I was not amending your code at all. I was showing by example as well as exhortation what all posters are asked to do -- and you are still not doing.

        You want to encourage people to answer your posts, and small encouragements, such as making your code readable, help too.

        Thanks for the simplification.

        I noted the bug I've pointed out already: copy needs two filenames. Fixing that would fix the bug you know about already, that save does not save anything except the data in memory to a .dta file.

        I can't test this but I think it's closer to what you seek. I've edited out the inconsistent use of delimiters. Your code declares that lines are delimited by semi-colons, but most lines aren't.


        Code:
        import excel "N:\Archive\WM_lookup.xlsx", sheet("Sheet1") firstrow clear
        keep if WM ==1
        
        cd N:\PDFs 
        
        g Name = CCG_Code+".pdf"
        
        levelsof Name, local(levels)
        
        foreach l of local levels {
            copy `l'  N:\PDFs\WM\`l' 
        }
        The code assumes that there are no spaces in filenames.

        Comment


        • #5
          Hi Nick,

          Thanks, I've tried this and am nearly there. However I find that I can only create a copied version if I prefix it with another character - i.e. say a 1 or a W.

          Code:
          cd N:\PDFs
          g Name = CCG_Code+".pdf"
          
          levelsof Name, local(levels)
          
           foreach l of local levels {
               copy `l' "N:\PDFs\WM\`l'", replace
                  }
          This doesn't work, but this does:


          Code:
          cd N:\PDFs
          g Name = CCG_Code+".pdf"
          
          levelsof Name, local(levels)
          
           foreach l of local levels {
               copy `l' "N:\PDFs\WM\WM_`l'", replace
                  }

          This is what is contained within the variable 'Name'

          Name
          04X.pdf
          04Y.pdf
          05A.pdf
          05C.pdf

          Comment


          • #6
            Beware the backstabbing backslash. Documented problem that backslashes act as escape characters and mess up local macros.

            Flagged in [U] 18.3.11 Constructing Windows filenames using macros

            discussed in the literature http://www.stata-journal.com/sjpdf.h...iclenum=pr0042

            and much posted here. (A search for pr0042 reveals 19 mentions.)

            Comment


            • #7
              Thanks Nick, as my OS is Windows, I routinely use " \ " in file paths. Changing those to a "/" resolves the problem - thanks for you help.

              This works
              Code:
              cd N:/PDFs
              g Name = CCG_Code+".pdf"
              
              levelsof Name, local(levels)
              
              foreach l of local levels {
                   copy `l' "N:/PDFs/WM/`l'", replace
                      }
              Last edited by Tim Evans; 18 Oct 2016, 06:56.

              Comment


              • #8
                This should work too:

                Code:
                cd N:/PDFs
                g Name = CCG_Code+".pdf"  
                levelsof Name, local(levels)  
                foreach l of local levels {      
                    local new = "N:\PDFs\WM\" + "`l'"      
                    copy `l' "`new'", replace
                }

                Comment


                • #9
                  Nick,

                  Thank you, it does

                  Comment

                  Working...
                  X