Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Appending file with loop

    Hi,

    I tried appending some files using the following code:

    Code:
    forvalues i=2/10 {
    
    use all in `i', clear
    
    levelsof Name, local (Name)
    
    use analysis_ra.dta, clear
    
    foreach n of local Name{
    
    append using "analysis_`n'.dta"
    
    save new, replace
    
    }
    
    }
    The idea is to append all files for which I have part of the file names stored in another file (all.dta). Concretely, the files I want to append all start with "analysis_ " and are then followed by the names which can be found in the file all.dta. The file analysis_ra.dta stands for the name ra which is first in the all.dta. Based on this file I wanted to append all the other files. My code only appends one single file but not all of them. Can you tell me where my problem lies?

    Thank you!

  • #2
    The problem is that each time through the loop, you read in a single file, analysis_ra, and then you append a new file to it. Then you save that in file new. You are not building up new each time through the loop, you are just clobbering it with the last version.

    I could fix this code, but let me give you a different approach that is, I think, cleaner.

    Code:
    clear
    tempfile building
    save `building', emptyok
    
    forvalues i = 1/10 {
        use all in `i', clear
        local n = Name[1]
        use analysis_`n', clear
        append using `building'
        save `"`building'"', replace
    }
    
    save new, replace
    Note: not tested. Beware of typos.

    Comment


    • #3
      Yes, I understand. Thank you for the solution.

      Comment


      • #4
        I have three dta files for three sectors of an economy (and these are the only three sectors that constitutes the country): rural, urban, and semi-urban. Each of the files has 3000 observation. I have to run this OLS regression:
        log(wage)= a+ summation b(sectors)+c(hhsize)+errors

        where summation implies summation of sectors running from 1 to 3 and b represents their respective coefficients

        Since it is a categorical variable the final results will have only 2 categories.

        The catch is I have to run this equation at the national level but without loading the data for all sectors at one go, that means I cannot have more than 3000 observations when I am trying to run this equation at the national level which means appending is not to be done explicitly. This is a programming task that I have to accomplish. Please help!

        Comment


        • #5
          Dear Clyde Schechter
          I want to append most of the files using a loop, but I want to leave some from appending. I tried running
          Code:
          local files : dir "" files "*.dta"
              foreach file in `files' {
                  append using `file'
          
              }
          This appends all the files, but I do not want to append all the files, How can I tweak the code to leave out some files from appending

          Thanks,
          (Ridwan)

          Comment


          • #6
            To exclude some files from being appended, you could create a local containing those files' names, and then append only files whose names are not present in that local.
            Code:
            local leave_out  thisfile.dta thatfile.dta otherfile.dta
            local files : dir "" files "*.dta"
            foreach file in `files' {
               if (strpos("`leave_out'", "`file'") == 0 ) {
                  append using `file'
               }
            }
            Constructing the local of files to be left out might be tedious, depending on how many such files there are, and what (if anything) their names might have in common. If constructing this local is is a problem for you, you'll need to tell us something about the rules for deciding which files are to be excluded.


            Comment


            • #7
              Thank you Mike Lacy .

              I have ~50 data files for each industry created from Master.dta & New_master.dta respectively. I want to append all the industries together leaving out Master.dta & New_master.dta, therefore i tried running the following code:
              Code:
              local leave_out Master.dta New_master.dta
              local files : dir "" files "*.dta"
              foreach file in `files' {
                 if (strpos("`leave_out'", "`file'") == 0 ) {
                    append using `file'
                 }
              }
              This code appends all the files in the folder including Master.dta & New_master.dta. I want to append all the files except these two.
              Please get back to me, I shall be very thankful.

              Comment


              • #8
                Code:
                local leave_out Master.dta New_master.dta
                local files : dir "" files "*.dta"
                local files: list files - leave_out
                foreach file in `files' {
                  append using `file'
                }

                Comment


                • #9
                  Thank you Clyde Schechter .
                  The code you sent appends all the files including the Master.dta & New_master.dta. When I run the code you sent, It throws a small error variable product_code is str5 in master but long in using data. However, product_code is a variable existing only in Master.dta & New_master.dta not in the files I want to append. Which means it is appending Master.dta & New_master.dta also. I use the force option in append to avoid this error
                  Code:
                  local leave_out Master.dta New_master.dta
                  local files : dir "" files "*.dta"
                  local files: list files - leave_out
                  foreach file in `files' {
                    append using `file', force
                  }

                  The total number of observations in 51 data files are 4438041. The Master.dta include 16,821,921 observations, where as New_master include 15,848,551 observations. The number of observations after using this command is 37108513 ( 4438041+16,821,921+15,848,551). It is difficult to identify the observations belonging to Master.dta & New_master.dta and then drop those.

                  Please get back to me, if there is any other way to define the local and leave out the data files not to be appended.


                  Thanks,

                  Comment


                  • #10
                    Yet another example of why using -force- usually ends badly. When you use -force- with append, it goes ahead an smashes the files together, substituting missing values for the data in the offending variable(s) of the using file. So there is no way this would possibly have achieved your goal. All it does is suppress the error message--which doesn't mean that things are going fine. It just means that whatever is going wrong, you will be unaware of. Of course, eventually the hidden problems resurface. You are lucky that, in this case, it resurfaced early.

                    As for how to get the exclusion, I'm going to guess now that you are running on Windows. In all of the other operating systems that Stata supports, filenames are case sensitive. But in Windows they are not. This can lead to problems, and I got bitten by it when I wrote the code in #8. In particular, although we have been referring to the filenames as Master.dta and New_Master.dta, because Windows doesn't recognize case in filenames, when you run the -local files : dir "" files "*.dta"- command, Windows gives the names to Stata as master.dta and new_master.dta, in lower case. In defining local macro leave_out, however, I specified them with upper case initial letters. So when we get to -local files: list files - leave_out-, nothing gets left out, because master.dta does not match Master.dta, and new_master.dta does not match New_master.dta.

                    The solution to the problem is a small modification:
                    Code:
                    local leave_out Master.dta New_master.dta
                    local files : dir "" files "*.dta", respectcase
                    local files: list files - leave_out
                    foreach file in `files' {
                      append using `file' // DON'T USE FORCE
                    }
                    The -respectcase- option will cause the local macro files to retain the actual case of the filenames, so they will now match properly with local macro -leave_out-. (Added: This solution will still work if you ever port the code to a different OS.)

                    More generally, I cannot emphasize enough the dangers of using -force- options. -append- is not the only command that has one. There are a few others. In all cases they provide a "quick and dirty" way to suppress certain error conditions. But in all cases where there is a -force- option, its use always entails loss of data. So unless you are 100% certain you know exactly what data will be lost, and are 100% certain that you do not now, and never will, need that lost data, you should not use -force-. Even under those situations, there are usually other simple enough ways to resolve the situation more safely than using -force-. If you are a daily user of Stata and you are using -force- options more than a couple of times a year, you are probably unwittingly mangling your data analyses and producing unreliable results with it.
                    Last edited by Clyde Schechter; 15 Mar 2023, 09:37.

                    Comment


                    • #11
                      Thank you very much Clyde Schechter .
                      Specifying the option [, respectcase] worked fine, and it successfully leaves out Master.dta and New_master.dta from append.
                      Moreover, thank you for this elaborated explanation, there was lot of learning from it. I shall be careful in future. I am a regular STATA user, and I was using force option only in this case.

                      Thank you very much,
                      Regards,
                      (Ridwan)

                      Comment

                      Working...
                      X