Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help with Merging CSV data sets and adding Variables to each data set

    Hi,
    I am using Stata 13.1.

    I have a bunch of data sets (all located in the same folder) saved as .csv. Each data set represents data from a different year (but there is no variable in each data set identifying the year it is from). I want to merge the data sets so I can study the trends from year to year. I need to add a variable "year" to each data set so I can correctly identify the data once the files are merged.

    The tedious way of doing this is to add the variable to each data set, save it to a .dta and then merge the files.

    Is there a faster way to do this?

    Thanks for the help.

  • #2
    Welcome to Statalist.

    The tedious way of doing this is to add the variable to each data set, save it to a .dta and then merge the files.
    That is the fundamental approach, but "tedious" or not is up for debate. Someone may have to do that with 200 lines of code while another person may be able to do the same with a few lines. In order for us to discern, it'd be necessary to know the csv files were named. If they are neatly name in sequential years, it'd be easy to code.

    Also, I think you meant "append" (which adds cases) and not "merge" (which adds variables). To find out the differences, check out help append and help merge.

    Comment


    • #3
      Thank you for pointing out my errors. You are correct on both points.

      File name: MUP_PHY_R19_P04_V10_D13_Geo
      each subsequent year is name D14, D15 etc.

      I can change the file names if that would make the code easier.

      Comment


      • #4
        So, it would be something like:

        Code:
        * Set your working directory
        cd "WHEREVER THE FILES WERE SAVED"
        
        * Loop thorough them, assuming your file ranges from 13-22:
        forvalues x = 13/22{
            import delimited "MUP_PHY_R19_P04_V10_D`x'_Geo.csv", clear
            generate year = `x'
            save MUP_PHY_R19_P04_V10_D`x'_Geo.dta, replace
        }
        Now check if they are done, then you can start building a long file:

        Code:
        use MUP_PHY_R19_P04_V10_D13_Geo, clear
        forvalues x = 14/22{
            append using MUP_PHY_R19_P04_V10_D`x'_Geo
        }
        And that should do it. Let us know if that works.

        Please also be aware that append sometimes can have trouble if a variable of the same name is formatted differently in different files (e.g. clashing between string and number, etc.) So, check the output log carefully.
        Last edited by Ken Chui; 30 Aug 2022, 11:55.

        Comment


        • #5
          Thank you very much. It worked.

          Comment

          Working...
          X