Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • insheet and append large file with extension in numbers (eg. As30898798.87)

    Hello I got problem to insheet and append all the files using forvalue loop....here is what I have done


    cd "F:\climate data\New folder"
    insheet using "F:\climate data\New folder\Rainfall Vulnerable\0306\AS030619.88"
    save "F:\climate data\New folder\masterdata", replace
    forvalue i=19.88/20.09 {
    insheet using "F:\climate data\New folder\Rainfall Vulnerable\0306\AS0306`i'", clear
    forvalue i= 19.89/20.09 {
    append using "F:\climate data\New folder\masterdata"

    }
    }

    Let me know if there is any alternate way....like subfolder 306 there are other 54 subfolders and data set for almost 50 years.....how can I insheet and append them all once.

    Thanks in advance.
    Last edited by Alvin Ghimire; 22 Apr 2017, 11:12.

  • #2
    So there are several problems here. First, you don't need two loops: you're only iterating over the sequence of filenames once. Second, you neglected to -save- the building masterdata file each time, and it gets wiped out each time you -insheet, clear- the next one.

    Then there is the problem of using a -forvalues- loop with floating point numbers. The problem here is twofold. One is a matter of the increment each time through the loop being 1, so after i = 19.88, the next thing Stata will try is 20.88, which is already beyond the end value of 20.09. The other is the issue of precision. Even if you refined it to -forvalues i = 19.88(0.01)20.09-, you might find out that when things are rounded to floating point precision in binary, that 19.88 + 0.01 is not exactly 19.89, because neither 19.88 nor 0.01 has an exact representation in binary. So you need to run your loop from 1988 to 2009 and divided by 100 inside the loop and then use the result displayed to 2 decimal places.

    So I think this will work:

    Code:
    cd "F:\climate data\New folder"
    insheet using "F:\climate data\New folder\Rainfall Vulnerable\0306\AS030619.88"
    save "F:\climate data\New folder\masterdata", replace
    forvalue i=1988/2009 {
       local ii: display %3.2f `=`i'/100'
        insheet using "F:\climate data\New folder\Rainfall Vulnerable\0306\AS0306`i'", clear
        append using "F:\climate data\New folder\masterdata"
       save "F:\climate data\New folder\masterdata", replace
    }
    A couple of other points. You don't need to refer to F:\climate data\New folder\ in any of those pathnames (except the first -cd- command). Once you are in that working directory, any other filename or pathname will be interpreted relative to that, so you can simplify and shorten the code that way.

    Also, even though you are working in Windows, and \ is the conventional path separator, you should form the habit, when working in Stata, of using / instead. The reason is that \ sometimes interacts with other things in the pathname and blocks the proper interpretation of macros. It doesn't happen in this particular code, but this is likely to bite you sooner or later.

    Also, if you are using current Stata, the -import delmited- command has superseded the older -insheet- command. See -help import delimited-.

    Finally, when doing a large series of appends like this, I recommend making sure that all of the individual files have been properly cleaned, and disagreements between them about formating, labeling, etc., be reconciled before undertaking this big series of appends. (Perhaps you have already done this.) Otherwise the combined data set can be a mess (or the appends may even fail to run if, for example, some variable is a string in one data set and numeric in another). While it is possible to clean the combined data set, my experience has been that it is usually simpler to clean them separately first.

    Comment


    • #3
      Thank you so much for your detail input. As I am a novice learner, I am just learning things. But after executing the code an error was there. file F:\climate data\New folder\Rainfall Vulnerable\0306\AS03061988.raw not found

      Comment


      • #4
        Oh, sorry. Typo. Should be:
        Code:
           insheet using "F:\climate data\New folder\Rainfall Vulnerable\0306\AS0306`ii'", clear

        Comment


        • #5
          I figured out the error....in insheet loop there must be `ii' instead of `i', I don't know either it is correct but I can see all the obs.

          Comment


          • #6
            Thank you for your quick turn around. It is the file containing days and rainfall data (string) which is delimited with space. Although the code executed perfectly, some rainfall data has value "DNA", which in original file there is rainfall amount. As there are another 54 folders like this how can I write code for all those folders, I would appreciate if you can figure it out for me. Thanks again.

            Comment


            • #7
              You have not explained your problem all that clearly. I'm guessing you're saying that there is a variable, called rainfall, which is numeric in some of the files, and is string (because of the value "DNA") in others. Do I have that right?

              My first thought is to say that you should step back and carefully scrutinize the files for other problems. Where one problem is obvious, other problems often lurk. Indeed, it would be somewhat surprising if there aren't other inconsistencies across the files. It is best to resolve those separately first, in my opinion.

              That said, if we want to imagine that this is the only problem that arises from putting these files together, you can do something like this:

              Code:
              forvalue i=1988/2009 {
                 local ii: display %3.2f `=`i'/100'
                  insheet using "F:\climate data\New folder\Rainfall Vulnerable\0306\AS0306`ii'", clear
                  capture confirm numeric var rainfall
                  if c(rc) == 7 {    // STRING VARIABLE FOUND
                      replace rainfall = "" if trim(rainfall) == "DNA"
                      capture assert missing(real(rainfall)) == missing(rainfall)
                      if c(rc) == 9 {
                          display as error "File AS0306`ii': Non-numeric rainfall"
                          exit 9
                      }
                      else if c(rc) != 0 { // SOME OTHER UNEXPECTED PROBLEM
                          display as error "File AS0306`ii'L Unexpected error"
                          exit c(rc)
                      }
                      destring rainfall, replace
                  }
                  else if c(rc) != 0 {    // SOME OTHER, UNEXPECTED PROBLEM
                      display as result "Unexpected problem in file AS0306`ii'"
                      exit c(rc)
                  }
                  append using "F:\climate data\New folder\masterdata"
                  save "F:\climate data\New folder\masterdata", replace
              }
              Note: Not tested. Beware of typos.

              The added code will look for a numeric variable called rainfall. If it finds, instead, a string variable by that name, it will remove the offending "DNA" observations and then re-check whether the variable is now entirely numbers. If so, it will convert it to a numeric variable and the proceed on to the -append- command. If, after removing "DNA" observations there is still non-numeric content in variable rainfall, Stata will display an error message and halt. If the rainfall variable is numeric in the first place, no special actions are taken: Stata just proceeds with -append-ing.

              Let me just re-emphasize my advice that this is unlikely to be the only problem in these data sets. If you attempt to fix them all during the append process, the code will become incomprehensibly complicated as each file is likely to have idioisyncratic problems, each of which will require additional code here. I urge you to step back from this and carefully inspect and clean each of these data sets separately first. Write a do-file which is a general cleaning script for these files. For each file, make a custom do-file based on the cleaning script that incorporates the specific problems encountered in that file. After each file has been cleaned, the simpler append code in #2 (as corrected in #4) will serve you will.

              Comment


              • #8
                Thank you. I will try my best and back to you.

                Comment


                • #9
                  As I looked the data once again carefully, I found both the numeric as well as text. But the value is "T" in almost all files, can't we replace T with 0.0 as in future we must do that? The code #7 doesn't seem to be working.

                  Comment


                  • #10
                    doesn't seem to be working
                    Specifically, what does "doesn't seem to be working" mean. What is it doing? Show the code you ran and Stata's output, and explain.

                    If you are encountering T in addition do "DNA", then the code in #7 will, as promised, halt and give you an error message because the code is looking only for DNA as an exception to numeric values. Look, this thread could run to hundreds of posts if we fix each glitch in the data one at a time. That's why I've suggested you carefully review all of the datasets and find all of the data problems and fix them there. It's pretty clear you haven't done that.

                    As for replacing "T" with 0.0, why would you do that? Do you actually know that the value T is some kind of code for 0.0? If not, then it makes no sense at all to replace it with 0.0. If T is a code for something, you should replace it with whatever number it is a code for. If T is a code for something that can't be expressed numerically, then you have to replace it with "", just as with "DNA" and perhaps note something about this in another variable.

                    Comment


                    • #11
                      It says....File AS030619.88: Non-numeric rainfall

                      Comment


                      • #12
                        So that is exactly what, in #7, I said would happen if it encountered something that is non-numeric but is not "DNA." The code works as promised. You didn't anticipate finding "T." Who knows what else is in there that you haven't anticipated? It makes no sense to try to deal with these things one at a time in this Forum. Please re-read my advice in the final paragraph of #7 and in #10.

                        Comment

                        Working...
                        X