Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using loop to import and merge excel files

    Hi,

    Please, I have 5 excel files: "File1.xls" to "File5.xls". Each file contains only 1 sheet: "Sheet1"
    I want to import and merge them using variable id.
    variable id uniquely identifies each observation in the excel files.

    Usually, I import each file separately save them .dta, then merge. or use xls2dta to convert to. dta before merging. But I was looking for doing it with loop (foreach or forval), is that possible ? also if I have these files with completely different names (not indexed by numbers) for example instead of "File1.xls" I have "name.xls" and "File2.xls" "age.xls". is that possible?

    I'm not also interested to keep the separate .dta files created before the merger

    I would much appreciate your assistance on the subject.

    Thank you in advance

  • #2
    Because you have already mentioned xls2dta (SSC, I suppose), I will point out that the command supports merging xls-files "on-the-fly". Assuming all your xls-files are in your working directory,

    Code:
    xls2dta , clear : merge 1:1 id using .
    will merge all xls (and xlsx)-files in the working directory and load the complete file into memory.

    Someone else might go on and re-create the loop that is implemented in xls2dta.

    Comment


    • #3
      Thank you Daniel

      Actually, that code doesn't work for some reason. To be more accurate my id variable is firm_id, does the underscore matter in the coding? I opened all the excel files to make sure that the variable is there, also it is not the variable in the first column in the excel files would that matter?

      Here is an example
      COMPANY_NAME firm_id DEBT_TO_ASSETS2019 DEBT_TO_ASSETS2018
      Acadia Realty Trust 103085 42.76 40.96
      ACRE Realty Investors Inc. 103589 NA NA
      Adaptive Real Estate Income Trust, Inc. 4169623 NA NA
      Aegis Realty, Inc. 113549 NA NA
      Agree Realty Corporation 103154 33.35 35.52
      Alexander & Baldwin, Inc. 4008292 34.84 34.97
      Alexander's, Inc. 107230 77.11 75.13
      Alexandria Real Estate Equities, Inc. 111908 38.33 37.87
      Alpine Income Property Trust, Inc. 19353248 0.00 NA
      ALST Casino Holdco, LLC 4686027 NA NA

      xls2dta , clear : merge 1:1 firm_id using .
      (110 vars, 813 obs)
      (92 vars, 813 obs)
      (8 vars, 813 obs)
      (200 vars, 813 obs)
      (74 vars, 813 obs)
      variable firm_id not found

      Thank you again

      Comment


      • #4
        Originally posted by daniel klein View Post
        Because you have already mentioned xls2dta (SSC, I suppose), I will point out that the command supports merging xls-files "on-the-fly". Assuming all your xls-files are in your working directory,

        Code:
        xls2dta , clear : merge 1:1 id using .
        will merge all xls (and xlsx)-files in the working directory and load the complete file into memory.

        Someone else might go on and re-create the loop that is implemented in xls2dta.
        It appears that STATA cannot read the merging variable "id" from excel files !! does it have to be in a specific format (in the excel files)?
        Thank you

        Comment


        • #5
          Try adding the option verbose, as in

          Code:
          xls2dta , clear verbose : merge 1:1 firm_id using .
          to see where things are going wrong.

          I am pretty sure that the problem is with one of the Excel files. Perhaps the variable name is spelled differently (e.g., uppercase) in one of the files.

          Comment


          • #6
            Thank you Daniel for your reply and assistance. I run the code verbose as you suggested and here is the result I also included a sample from the 5 files. any further suggestion, as usual, would be much appreciated

            . xls2dta , clear verbose : merge 1:1 firm_id using .
            5 files found
            processing file .\us_firm_data_2002_20191.xls
            processing sheet 1/1
            (110 vars, 813 obs)
            processing file .\us_firm_data_2002_20193.xls
            processing sheet 1/1
            (74 vars, 813 obs)
            processing file .\us_firm_data_2002_20195.xls
            processing sheet 1/1
            (8 vars, 813 obs)
            processing file .\us_firm_data_2002_20192.xls
            processing sheet 1/1
            (200 vars, 813 obs)
            processing file .\us_firm_data_2002_20194.xls
            processing sheet 1/1
            (92 vars, 813 obs)
            combining file 2/5
            variable firm_id not found
            r(111);

            end of do-file

            Here is a sample from each file
            COMPANY_NAME firm_id CUSIP
            Acadia Realty Trust 103085 004239109
            ACRE Realty Investors Inc. 103589 00489F106
            COMPANY_NAME firm_id NUM_PPTY_HEALTHCARE2019
            Acadia Realty Trust 103085 0
            ACRE Realty Investors Inc. 103589 NA
            COMPANY_NAME firm_id NUM_PPTY2019
            Acadia Realty Trust 103085 180
            ACRE Realty Investors Inc. 103589 NA
            COMPANY_NAME firm_id NUM_PPTY_MULTIFAM2019
            Acadia Realty Trust 103085 0
            ACRE Realty Investors Inc. 103589 NA
            COMPANY_NAME firm_id DEBT_TO_ASSETS2019
            Acadia Realty Trust 103085 42.76
            ACRE Realty Investors Inc. 103589 NA

            Comment


            • #7
              Are those variable names stored in the first row in the xls-files? If so, you need

              Code:
              xls2dta , clear import(firstrow) : merge 1:1 firm_id using .
              Perhaps you should start by using import excel on one of the files for troubleshooting and making sure you are getting what you want in Stata? Then come back to xls2dta for the mass import.

              Comment


              • #8
                Can't thank you enough!! that solved the issue.

                Comment

                Working...
                X