Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • append two data using loop

    Hi there -

    I am trying to append two dataset of consequent years from 1998 to 2019. I need a total of 21 files (1998-1999, 1999-2000, 2000-2001, 2001-2002, . . ., 2018-2019). Current names are 'sic1998', 'sic1999' 'sic2000', and so on 'sic2019'.


    As the dataset is huge, I would like to use the command below, but I wonder if I can use `i+1' for the next year file.

    This is for one year example:
    use "G:\sic1998.dta" , clear
    append using "G:\sic1999.dta"
    save "G:\m1999.dta"


    This is something what I want to do:
    forvalues i = 1998(1)2019 {
    use "G:\sic`i'.dta" , clear
    append using "G:\sic`i+1'.dta"
    save "G:\m`i+1'.dta"
    }


  • #2
    There are many ways of doing what you want, and the easiest to understand (I think) would be
    Code:
    forvalues i = 1998(1)2019 {
        use "G:\sic`i'.dta" , clear
        local j = `i'+1
        append using "G:\sic`j'.dta"
        save "G:\m`j'.dta"
    }
    Crossed with post #3 which makes the important point I overlooked.
    Last edited by William Lisowski; 17 Feb 2020, 17:20.

    Comment


    • #3
      Code:
      clear
      forvalues j = 1998/2019 {
         local jp1 = `j'+1
         use "G:/sic`j'.dta, clear
         append using "G:/sic`jp1'.dta"
         save "G:/m`jp1'.dta", replace
      }
      There is another syntax that does not require creating a separate local macro jp1 to store the value of `j'+1 and instead calculates it on the fly. But since you are using it twice inside the loop, it makes sense to only calculate it once and store the value rather than calculating it twice.

      Before you actually try to run this, have you checked all of these files to verify that they can be properly appended? Are you certain that the same variables have exactly the same names in all of the files, i.e. no variation in capitalization or spelling variations? Are you certain that each variable is either numeric in all of the files or string in all of the files? If any of the variables are value-labeled integer variables, have you verified that the value labels are identical in all of the files? If missing values have been numerically encoded, has the same encoding been used in all of the files? If not, the code will either break with error message notifying you of the incompatibility, or, worse, the incompatibility will go undetected and Stata will rapidly churn out a bunch of garbage files which you may or may not notice are wrong until some maximally inconvenient time in the future when the consequences of the errors show up in some way you cannot avoid noticing. It is not enough to say that the data sets all come form the same source, a source that is known to be reliable. Maintaining that level of consistency across 20+ files seldom happens in real life.

      Edit: Crossed with #2 which provides the same solution using different names for the local macros and spares you my rant about file consistency.

      Comment


      • #4
        That was so simple. It works perfect! Thank you!

        Comment

        Working...
        X