Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • appending using forvalues leaves out half of the data

    Dear List
    i use a short n sweet loop to append a dataset divided by year.
    the problem is that at year 1995 one of the variables goes from being str5 to str6 in 1995
    for some reason stata then removes all data from before 1995
    if I append the years one by one stata manages to 'reformat' the str so the dataset fits and all data is included.

    Code:
    forvalues year =1977(1)2012 {
    use file_year_`x'.dta
    if `x' !=1977 append using file_year_`x'.dta
    ​save file_all_years.dta, replace
    }
    can you see what i am doing wrong?

    lars

  • #2
    it seems to boil down to one variable var1. This is formatted as str5 prior to 1994 and str7 from 1995.
    So i tried to make a loop to change all the diag to str7

    Code:
     forvalues year =1977(1)2012 {
    use file_year_`x'.dta
    format diag %7s
    ​save file_year_`x'.dta, replace
    }
    that did not help one bit.

    hope you can help. it annoys me a fair bit.

    lars

    Comment


    • #3
      You should probably post your actual code (what you show doesn't make sense—you're appending files to themselves and saving them over the previous results) and post or attach a log file that illustrates the problem.

      Stata will automatically "promote" string lengths during append even in loops, and so that isn't the problem.

      .ÿversionÿ14.0

      .ÿ
      .ÿclearÿ*

      .ÿsetÿmoreÿoff

      .ÿ
      .ÿforeachÿiÿinÿ1977ÿ1994ÿ1995ÿ1996ÿ{
      ÿÿ2.ÿÿÿÿÿÿÿÿÿdropÿ_all
      ÿÿ3.ÿÿÿÿÿÿÿÿÿquietlyÿsetÿobsÿ1
      ÿÿ4.ÿÿÿÿÿÿÿÿÿifÿ`i'ÿ<ÿ1995ÿgenerateÿstr5ÿmy_string_variableÿ=ÿ"`i'0"
      ÿÿ5.ÿÿÿÿÿÿÿÿÿelseÿgenerateÿstr6ÿmy_string_variableÿ=ÿ"`i'00"
      ÿÿ6.ÿÿÿÿÿÿÿÿÿtempfileÿ`i'
      ÿÿ7.ÿÿÿÿÿÿÿÿÿquietlyÿsaveÿ``i''
      ÿÿ8.ÿ}

      .ÿ
      .ÿdropÿ_all

      .ÿtempfileÿfile_all_years

      .ÿ
      .ÿ*
      .ÿ*ÿBeginÿhere
      .ÿ*
      .ÿforeachÿxÿinÿ1977ÿ1994ÿ1995ÿ1996ÿ{
      ÿÿ2.ÿÿÿÿÿÿÿÿÿuseÿ``x''
      ÿÿ3.ÿÿÿÿÿÿÿÿÿifÿ`x'ÿ!=ÿ1977ÿappendÿusingÿ`file_all_years'
      ÿÿ4.ÿÿÿÿÿÿÿÿÿquietlyÿsaveÿ`file_all_years',ÿreplace
      ÿÿ5.ÿ}

      .ÿ
      .ÿlist,ÿnoobsÿabbreviate(20)

      ÿÿ+--------------------+
      ÿÿ|ÿmy_string_variableÿ|
      ÿÿ|--------------------|
      ÿÿ|ÿÿÿÿÿÿÿÿÿÿÿÿÿ199600ÿ|
      ÿÿ|ÿÿÿÿÿÿÿÿÿÿÿÿÿ199500ÿ|
      ÿÿ|ÿÿÿÿÿÿÿÿÿÿÿÿÿÿ19940ÿ|
      ÿÿ|ÿÿÿÿÿÿÿÿÿÿÿÿÿÿ19770ÿ|
      ÿÿ+--------------------+

      .ÿ
      .ÿexit

      endÿofÿdo-file


      .


      Comment


      • #4
        Several errors here, although I can't make sense of your problem report in #1, as it seems to me that your code should not work at all.

        Changing the display format will not change the storage type.

        More importantly, consider what you are trying to do for (e.g.) 2012.

        Code:
        use file_year_2012.dta
        append using file_year_2012.dta ​
        save file_all_years.dta, replace
        You (are writing as if you want to) read in 2012, append the same data and then overwrite your global dataset. So the global dataset ends the loop containing two copies of the 2012 data. Only!

        But there is another problem. Your loop statement says year, but inside the loop you say x. x is never defined.

        I think you want something more like this:

        Code:
        use file_year_1977  
        forval x = 1978/2012 {
            append using file_year_`x' 
        }
        save file_all_years.dta, replace
        In fact, in recent Statas append does not require a loop, but I stop there.

        (What version of Stata are you using? If it's not 14, you should be telling us.)
        Last edited by Nick Cox; 13 Oct 2015, 04:15. Reason: EDIT: In the first posting the last code block was written too quickly and carelessly. I stole Daniel's code.

        Comment


        • #5
          I do not see how this strategy could ever work. You should at best end up with one file only, as you seem to append every file to itself inside your loop, then save this file. Actually, your code should result in an error message

          Code:
          file file_year_.dta not found
          r(601);
          the first time through the loop, as you refer to `x', when you never define it. What you probably want is

          Code:
          use file_year_1977
          forvalues year = 1978/2012 {
              append using file_year_`year'
          }
          save file_all_years
          Best
          Daniel

          Comment


          • #6
            the bad thing - for me - is that, that was the actual code… and yes i figured out that i was just appending files to them selfs. leaving a file called file_all_years only including 2012 the last year. this had an observation ending i 2012 that started in 1995 - thus letting me to believe that 1995 was the problem. But actually it was the loop in it self.

            I have a bit of a problem understanding your latter loop?

            the files i want to append to each other is called temp_1977 to temp_2012
            they are identical regarding variable names and formation (string vs numeric)

            i want to start with the 1977 file and than ad on each year and saving it into a total file with all years.

            lars

            Comment


            • #7
              Note here:

              1. Statalist working well. Three experienced users replying quickly and giving similar advice.

              2. Statalist working badly. None of us really believes in the code as posted. Not posting the exact code you used really stretches the patience and wastes time and effort from willing volunteers.

              Comment


              • #8
                the bad thing - for me - is that, that was the actual code…
                I doubt that, as you would receive an error message, as stated above - but let us move on.

                I have a bit of a problem understanding your latter loop?

                the files i want to append to each other is called temp_1977 to temp_2012
                they are identical regarding variable names and formation (string vs numeric)

                i want to start with the 1977 file and than ad on each year and saving it into a total file with all years.
                Well, this is what the loop does. Change file_year_`year' to temp_`year' and you are done. If you have further problems regarding the code, I suggest you review the documentation on how append works.

                Best
                Daniel

                [Edit]
                In fact, in recent Statas append does not require a loop, but I stop there.
                I was thinking in this direction, too. However, in my view this feature would profit very much from allowing wildcards or something in filename. Otherwise you still need to get a list of all the files you want to append, which will lead you to extended function dir. Maybe someone should take a view minutes and write up a wrapper ...
                [/Edit]


                [Edit]
                Ok, here is a draft

                Code:
                *! version 1.0.0 13oct2015 daniel klein
                
                pr appendfiles
                    vers 11.2
                    
                    syntax using/ [ , * ]
                    
                    m : appendfiles()
                end
                
                vers 11.2
                
                m :
                
                void appendfiles()
                {
                    string scalar Using, Dir, Fn
                    string colvector Filenames
                    real scalar rc
                    
                    Using = st_local("using")
                    
                    if (pathsuffix(Using) == "") {
                        Using = Using + ".dta"
                    }
                    
                    pathsplit(Using, Dir = "", Fn = "")
                    Filenames = dir(Dir, "files", Fn, 1)
                    
                    if (!rows(Filenames)) {
                        errprintf("no files found\n")
                        exit(601)
                    }
                    
                    Filenames[1] = char(34) + Filenames[1] + char(34)
                    
                    for (i = 2; i <= rows(Filenames); ++i) {
                        Filenames[1] = Filenames[1] + ///
                            char((32, 34)) + Filenames[i] + char(34)
                    }
                    
                    rc = _stata("append using " ///
                        + Filenames[1] + ", " + st_local("options"))
                    
                    if (rc) {
                        exit(rc)
                    }
                }
                
                end
                The problem above discussed should be solved typing

                Code:
                clear
                appendfiles using temp_????.dta
                [/Edit]
                Last edited by daniel klein; 13 Oct 2015, 05:38.

                Comment


                • #9
                  sorry you're right.
                  where it said

                  Code:
                  forvalues year =1977(1)2012 {
                  use file_year_`x'.dta
                  it should have been
                  Code:
                  forvalues x =1977(1)2012
                  i am sorry that i stretch patience and waste your time, but i must admit i had not seen Nick Cox' or Daniel Klein's posts when i wrote my #6.
                  I have run your code and it works perfectly, thank you. it would have taken me for ever to figure that out on my own, thank you.

                  The fact that i don't need a loop to append, is a different matter all together

                  Code:
                  use file_year_1977
                  append "\file_year_1978" "\file_year_2012"
                  does the same thing - i figured out after looking through the append documentation example 6.

                  Thank you for your help.

                  lars

                  Comment


                  • #10
                    Lars: Thanks for the closure here, detailed and helpful.

                    Comment


                    • #11
                      The fact that i don't need a loop to append, is a different matter all together

                      Code:
                      use file_year_1977
                      append "\file_year_1978" "\file_year_2012"
                      does the same thing - i figured out after looking through the append documentation example 6.
                      Sorry, but this is not true and, thus, not helpful but misleading. The command above will append exactly two datasets (file_year_1978 and file year_2012) to the dataset in memory (file_year_1977). By no means will it append the files file_year_1979, file_year_1980, ..., file_year_2011.

                      Best
                      Daniel

                      Comment


                      • #12
                        Daniel,
                        Thank you for clearing that up. That explains why i couldn't find all the cases in my total_years file.
                        i will end this append atrocity (i have created) and following your earlier advice.
                        Lars

                        Comment

                        Working...
                        X