Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merging from macro filelist

    I have two loops, with the goal of merging one or two xlsx input files to a central dta. The first foreach loop was composed to save each possible input file to dta, and the second to merge each dta file in the filelist macro to the larger central table. On the merge step, I'm receiving an invalid file specification error, but have not been able to identify any errors in the file path or macros. Any insight, into what may be causing this error is appreciated.

    Code:
    dir *.xlsx
    
    local filelist: dir "." files "Testing Metrics*.xlsx"
    di `"`filelist'"'
    
    foreach file of local filelist {
        import excel "`file'", firstrow case(lower) clear
        local outfile: subinstr local file ".xlsx" "", all
        sort date
        save "`outfile'", replace
    }
    local filelist_dta: dir "." files "testing metrics*.dta"
    di `"`filelist_dta'"'
    
    *** 2b. Merging monthly dta files using Master Lab Testing Table
    ///        RATIONALE:     Using Update replace option for the merge, so that newer using file over-writes
    ///                    previous results available in master lab testing file for that same day.
    
    foreach file of local filelist_dta {
            use "`lab_dta'", clear
            sort date
            merge 1:1 date using "`filelist_dta'", update replace
            save "`lab_dta'", replace
            clear
    }
    Secondarily, it is possible that the two input files may over-write each other for certain dates. Is it possible to specify that, on conflict, the using file should update and replace the existing data if variable interim=="Non-interim"?

    Kind thanks on any insight available.

  • #2
    You don't tell where the invalid file specification error is coming up. I notice that local macro lab_data is not defined, at least not within the code you show. It is possible that the command to -use- it is where the code is foundering. To see which command is throwing the error, put the commands
    Code:
    set tracedepth 1
    set trace on
    at the top of the code and re-run. Now each command will be echoed as it is executed, even inside the loops, and you will be able to see which command is breaking, and you will also see how it expanded any macros it contains. That may give you a better insight into what is going on.

    Other possible sources of difficulty include:
    1. Perhaps `lab_data' (assuming it is actually defined and you just didn't show us that) is in a different directory from the current working directory. If so, you need to give its full path to -use- and -save- it.
    2. Are you running all of the code without interruption from beginning to end? If you are doing it in chunks, say, for example, the code before the comment and then running the code after it later, then local macro filelist_dta will go out of scope when the code is interrupted, and this will cause it to be treated as an empty string in the merging loop--leading to the error message you are getting.
    If none of this helps you find the problem, post back showing the complete output of the code run with trace as suggested in my first paragraph.

    Is it possible to specify that, on conflict, the using file should update and replace the existing data if variable interim=="Non-interim"?
    Not with -merge-. -merge- will either always replace if -update replace- is specified, or retain non-missing originals but update original missings with non-missing values from the using data set if only -update- is specified, or retain the original values (missing or not) if nothing is specified. You cannot condition it on a variable in the data.

    To accomplish the kind of conditional updating you want I would instead use frames. Something like this:
    Code:
    use "`lab_dta'", clear
    sort date
    
    frame create for_import
    foreach file of local filelist_dta {
        ds date, not
        local lab_dta_vars `r(varlist)'
        frame for_import {
            use "`filelist_dta'", clear
            ds date, not
            local import_vars `r(varlist)'
            local overlap: list import_vars & lab_dta_vars
            local new_vars: list import_vars - lab_dta_vars
        }
        frlink 1:1 date, frame(for_import)
        foreach v of local overlap {
            replace `v' = frval(for_import, `v') if interim == "Non-interim"
        }
        if `"`new_vars'"' != "" {
            frget `new_vars', from(for_import)
        }
    }
    Note: The above code is not tested. Beware of typos or other errors.

    Notice, by the way, that even in your original code, -use-ing and -save-ing `lab_dta' each time through the loop is just needless disk thrashing. You can just start with `lab_dta' in memory and then iteratively join in successive files from filelist_dta. Then, at the end, you can save it.

    Explanation of changes: The code as originally written would have broken if on some iteration of the loop there were no variables to update or no variables that were new in the using file. The revisions (in bold face) now tolerate those possibilities.
    Last edited by Clyde Schechter; 11 Jul 2024, 13:06. Reason: Correct logic errors in code suggested.

    Comment


    • #3
      I've included the output below, including the trace info. The `lab_dta' macro is in a different directory than the working directory, so I included the full file path for the merge step. I also removed the -use and -save of the lab_dta outside the foreach loop, as directed. I am running all of the code together.

      Code:
      
      
      . do "I:\EandE\5. Pipeline\Stata Scripts\ID-Resp_2024-Master Lab Testing\Creating Master Lab Testing.do"
      
      . /***************************************************************************************************
      >                                                                 CREATING MASTER LAB TESTING TABLE
      > ****************************************************************************************************
      > PURPOSE
      >         Generate the Master lab Testing DATA ASSET, which is:
      >                 - a dataset of lab testing data
      >                 - created by merging exports with the existing master lab data
      >
      >                 -Update replace used for merge, so new input files replaces older testing values.
      >
      >         
      > ****************************************************************************************************
      > REVISION HISTORY
      >   v1    2024-Jul-08     SMeadows        Original version.
      >   V2    2024-Jul-11     SMeadows        Updated merging procedure. Creating views in separate do file.
      >
      > ***************************************************************************************************/
      .
      . ****************************************************************************************************
      . *** PART 0: Precode Set-up
      . ****************************************************************************************************
      . clear all       // Starting each file with a fresh slate, to reduce dependencies
      
      . set tracedepth 1
      
      . set trace on
      
      .
      . *** 0.A  Initializing local macros for filepaths and files
      . ********************************************
      .
      . *** 0.A.1 Folder Navigation
      . local fldr_pipeline             "I:\EandE\5. Pipeline"
      
      . local fldr_refresh              "I:\EandE\4. Refreshes"
      
      . local fldr_SAS_VA_HAPI  "`fldr_refresh'\Human API (HAPI) Files\ID_SAS_VA"
      
      . local fldr_archive              "`fldr_SAS_VA_HAPI'\Archive"
      
      .
      . *** 0.A.2 Output Lab Testing (Labs)
      .
      . local fldr_lab  "`fldr_pipeline'\Data\ID_Resp_2024_Master Lab Testing"
      
      . local lab_dta   "`fldr_lab'\MasterLabTesting.dta"
      
      .                                                                                                                                                                                             
      >     
      . *** 0.A.2 Input Reference Data Assets (Ref)
      . *local fldr_date                "`fldr_pipeline'\Data\zS_Ref_2024_Date Calendar"
      . *local Date_ref         "`fldr_date'\DateCalendar.xlsx"
      .
      .
      . ***************************************************************************************************
      . *** PART 1:     Merge Input files
      . ****************************************************************************************************
      .
      . ***1A. Change working directory
      . cd "`fldr_SAS_VA_HAPI'"
      I:\EandE\4. Refreshes\Human API (HAPI) Files\ID_SAS_VA
      
      .
      . ***1B. Load HAPI Labs files (Inputs most recent month [interim], and past month)
      .
      . dir *.xlsx
        12.3k   7/09/24 10:02  COVID-19 Testing Metrics (1).xlsx
         6.4k   7/09/24 10:02  COVID-19 Testing Metrics.xlsx
      
      .
      . local filelist: dir "." files "COVID-19 Testing Metrics*.xlsx"
      
      . di `"`filelist'"'
      "covid-19 testing metrics (1).xlsx" "covid-19 testing metrics.xlsx"
      
      .
      . foreach file of local filelist {
        2.         import excel "`file'", firstrow case(lower) clear
        3.         local outfile: subinstr local file ".xlsx" "", all
        4.         sort date
        5.     save "`outfile'", replace
        6. }
      - foreach file of local filelist {
      - import excel "`file'", firstrow case(lower) clear
      = import excel "covid-19 testing metrics (1).xlsx", firstrow case(lower) clear
        --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- begin import ---
        - version 12
        - local version : di "version " string(_caller()) ":"
        - gettoken subcmd 0 : 0, parse(" ,")
        - if `"`subcmd'"' == "dbase" {
        = if `"excel"' == "dbase" {
          `version' ImpDbase `macval(0)'
          }
        - else if `"`subcmd'"' == "delim" {
        = else if `"excel"' == "delim" {
          `version' ImpDelim `macval(0)'
          }
        - else if `"`subcmd'"' == "delimi" {
        = else if `"excel"' == "delimi" {
          `version' ImpDelim `macval(0)'
          }
        - else if `"`subcmd'"' == "delimit" {
        = else if `"excel"' == "delimit" {
          `version' ImpDelim `macval(0)'
          }
        - else if `"`subcmd'"' == "delimite" {
        = else if `"excel"' == "delimite" {
          `version' ImpDelim `macval(0)'
          }
        - else if `"`subcmd'"' == "delimited" {
        = else if `"excel"' == "delimited" {
          `version' ImpDelim `macval(0)'
          }
        - else if `"`subcmd'"' == "exc" {
        = else if `"excel"' == "exc" {
          ImpExcel `macval(0)'
          }
        - else if `"`subcmd'"' == "exce" {
        = else if `"excel"' == "exce" {
          ImpExcel `macval(0)'
          }
        - else if `"`subcmd'"' == "excel" {
        = else if `"excel"' == "excel" {
        - ImpExcel `macval(0)'
        = ImpExcel  "covid-19 testing metrics (1).xlsx", firstrow case(lower) clear
      (31 vars, 30 obs)
        - }
        - else if `"`subcmd'"' == "fred" {
        = else if `"excel"' == "fred" {
          ImpFred `macval(0)'
          }
        - else if `"`subcmd'"' == "hav" {
        = else if `"excel"' == "hav" {
          ImpHaver `macval(0)'
          }
        - else if `"`subcmd'"' == "have" {
        = else if `"excel"' == "have" {
          ImpHaver `macval(0)'
          }
        - else if `"`subcmd'"' == "haver" {
        = else if `"excel"' == "haver" {
          ImpHaver `macval(0)'
          }
        - else if `"`subcmd'"' == "sas" {
        = else if `"excel"' == "sas" {
          ImpSas `macval(0)'
          }
        - else if `"`subcmd'"' == "sasxport5" {
        = else if `"excel"' == "sasxport5" {
          ImpSasxport5 `macval(0)'
          }
        - else if `"`subcmd'"' == "sasxport8" {
        = else if `"excel"' == "sasxport8" {
          ImpSasxport8 `macval(0)'
          }
        - else if `"`subcmd'"' == "sasxport" {
        = else if `"excel"' == "sasxport" {
          if (_caller() < 16) {
          ImpSasxport5 `macval(0)'
          }
          else {
          di as error "invalid syntax"
          di as error "   specify either {cmd:import sasxport5} or {cmd:import sasxport8}"
          exit 198
          }
          }
        - else if `"`subcmd'"' == "shp" {
        = else if `"excel"' == "shp" {
          ImpShape `macval(0)'
          }
        - else if `"`subcmd'"' == "spss" {
        = else if `"excel"' == "spss" {
          ImpSpss `macval(0)'
          }
        - else {
          display as error `"import: unknown subcommand "`subcmd'""'
          exit 198
          }
        ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- end import ---
      - local outfile: subinstr local file ".xlsx" "", all
      - sort date
      - save "`outfile'", replace
      = save "covid-19 testing metrics (1)", replace
      file covid-19 testing metrics (1).dta saved
      - }
      - import excel "`file'", firstrow case(lower) clear
      = import excel "covid-19 testing metrics.xlsx", firstrow case(lower) clear
        --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- begin import ---
        - version 12
        - local version : di "version " string(_caller()) ":"
        - gettoken subcmd 0 : 0, parse(" ,")
        - if `"`subcmd'"' == "dbase" {
        = if `"excel"' == "dbase" {
          `version' ImpDbase `macval(0)'
          }
        - else if `"`subcmd'"' == "delim" {
        = else if `"excel"' == "delim" {
          `version' ImpDelim `macval(0)'
          }
        - else if `"`subcmd'"' == "delimi" {
        = else if `"excel"' == "delimi" {
          `version' ImpDelim `macval(0)'
          }
        - else if `"`subcmd'"' == "delimit" {
        = else if `"excel"' == "delimit" {
          `version' ImpDelim `macval(0)'
          }
        - else if `"`subcmd'"' == "delimite" {
        = else if `"excel"' == "delimite" {
          `version' ImpDelim `macval(0)'
          }
        - else if `"`subcmd'"' == "delimited" {
        = else if `"excel"' == "delimited" {
          `version' ImpDelim `macval(0)'
          }
        - else if `"`subcmd'"' == "exc" {
        = else if `"excel"' == "exc" {
          ImpExcel `macval(0)'
          }
        - else if `"`subcmd'"' == "exce" {
        = else if `"excel"' == "exce" {
          ImpExcel `macval(0)'
          }
        - else if `"`subcmd'"' == "excel" {
        = else if `"excel"' == "excel" {
        - ImpExcel `macval(0)'
        = ImpExcel  "covid-19 testing metrics.xlsx", firstrow case(lower) clear
      (31 vars, 7 obs)
        - }
        - else if `"`subcmd'"' == "fred" {
        = else if `"excel"' == "fred" {
          ImpFred `macval(0)'
          }
        - else if `"`subcmd'"' == "hav" {
        = else if `"excel"' == "hav" {
          ImpHaver `macval(0)'
          }
        - else if `"`subcmd'"' == "have" {
        = else if `"excel"' == "have" {
          ImpHaver `macval(0)'
          }
        - else if `"`subcmd'"' == "haver" {
        = else if `"excel"' == "haver" {
          ImpHaver `macval(0)'
          }
        - else if `"`subcmd'"' == "sas" {
        = else if `"excel"' == "sas" {
          ImpSas `macval(0)'
          }
        - else if `"`subcmd'"' == "sasxport5" {
        = else if `"excel"' == "sasxport5" {
          ImpSasxport5 `macval(0)'
          }
        - else if `"`subcmd'"' == "sasxport8" {
        = else if `"excel"' == "sasxport8" {
          ImpSasxport8 `macval(0)'
          }
        - else if `"`subcmd'"' == "sasxport" {
        = else if `"excel"' == "sasxport" {
          if (_caller() < 16) {
          ImpSasxport5 `macval(0)'
          }
          else {
          di as error "invalid syntax"
          di as error "   specify either {cmd:import sasxport5} or {cmd:import sasxport8}"
          exit 198
          }
          }
        - else if `"`subcmd'"' == "shp" {
        = else if `"excel"' == "shp" {
          ImpShape `macval(0)'
          }
        - else if `"`subcmd'"' == "spss" {
        = else if `"excel"' == "spss" {
          ImpSpss `macval(0)'
          }
        - else {
          display as error `"import: unknown subcommand "`subcmd'""'
          exit 198
          }
        ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- end import ---
      - local outfile: subinstr local file ".xlsx" "", all
      - sort date
      - save "`outfile'", replace
      = save "covid-19 testing metrics", replace
      file covid-19 testing metrics.dta saved
      - }
      
      . local filelist_dta: dir "." files "covid-19 testing metrics*.dta"
      
      . di `"`filelist_dta'"'
      "covid-19 testing metrics (1).dta" "covid-19 testing metrics.dta"
      
      .
      . *** 2b. Merging monthly dta files using Master Lab Testing Table
      . ///             RATIONALE:      Using Update replace option for the merge, so that newer using file over-writes
      > ///                                     previous results available in master lab testing file for that same day.
      >
      . use "I:\EandE\5. Pipeline\Data\ID_Resp_2024_Master Lab Testing\MasterLabTesting.dta", clear
      
      . sort date
      
      .
      . foreach file of local filelist_dta {
        2.                 merge 1:1 date using "`filelist_dta'", update replace
        3. }
      - foreach file of local filelist_dta {
      - merge 1:1 date using "`filelist_dta'", update replace
      = merge 1:1 date using ""covid-19 testing metrics (1).dta" "covid-19 testing metrics.dta"", update replace
        =void resultlist(string scalar macname)
        =        string rowvector    src
        =        string colvector    res
        =        real scalar    i
        =string scalar result(string scalar result)
        =        real scalar    l
        =void resulttext(string scalar macroname, real scalar i)
        =        string scalar    toret
        =void map_mtype(string scalar macname)
        =        string scalar    input, toret
        =void fullfilename(string scalar forstatamac, string scalar fullnamemac, string scalar macname)
        =        string scalar    fullname, path, filename
        =        real scalar    l
        ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- begin merge ---
        - version 11
        - gettoken mtype 0 : 0, parse(" ,")
        - if (!strpos("`mtype'", ":") & "`mtype'"!="") {
        = if (!strpos("1:1", ":") & "1:1"!="") {
          if (_caller()>=11) {
          di as smcl as txt "{p}"
          di as smcl "(note: you are using old"
          di as smcl "{bf:merge} syntax; see"
          di as smcl "{bf:{help merge:[D] merge}} for new syntax)"
          di as smcl "{p_end}"
          }
          merge_10 `mtype' `0'
          exit
          }
        - local origmtype `"`mtype'"'
        = local origmtype `"1:1"'
        - mata: map_mtype(`"mtype"')
        - gettoken token : 0, parse(" ,")
        - if ("`token'"=="_n") {
        = if ("date"=="_n") {
          if ("`mtype'"!="1:1") {
          error_seq_not11 "`mtype'" "`origmtype'"
          }
          gettoken token 0 : 0, parse(" ,")
          local mtype "_n"
          }
        - syntax [varlist(default=none)] using/ [, ASSERT(string) DEBUG GENerate(name) FORCE KEEP(string) KEEPUSing(string) noLabels NOGENerate noNOTEs REPLACE noREPort SORTED UPDATE ]
      invalid file specification
        ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ end merge ---
        }
      r(198);
      
      end of do-file

      Comment


      • #4
        Here is the updated code, without the trace, for additional clarity.

        Code:
        ****************************************************************************************************
        *** PART 0: Precode Set-up
        ****************************************************************************************************
        clear all     // Starting each file with a fresh slate, to reduce dependencies
        set tracedepth 1
        set trace on
        
        *** 0.A  Initializing local macros for filepaths and files
        ********************************************
        
        *** 0.A.1 Folder Navigation
        local fldr_pipeline        "I:\EandE\5. Pipeline"
        local fldr_refresh        "I:\EandE\4. Refreshes"
        local fldr_SAS_VA_HAPI     "`fldr_refresh'\Human API (HAPI) Files\ID_SAS_VA"
        local fldr_archive        "`fldr_SAS_VA_HAPI'\Archive"
        
        *** 0.A.2 Output Lab Testing (Labs)
        
        local fldr_lab    "`fldr_pipeline'\Data\ID_Resp_2024_Master Lab Testing"
        local lab_dta    "`fldr_lab'\MasterLabTesting.dta"
                                                                                                        
        *** 0.A.2 Input Reference Data Assets (Ref)
        *local fldr_date        "`fldr_pipeline'\Data\zS_Ref_2024_Date Calendar"
        *local Date_ref        "`fldr_date'\DateCalendar.xlsx"
        
        
        ***************************************************************************************************
        *** PART 1:    Merge Input files
        ****************************************************************************************************
        
        ***1A. Change working directory
        cd "`fldr_SAS_VA_HAPI'"
        
        ***1B. Load HAPI Labs files (Inputs most recent month [interim], and past month)
        
        dir *.xlsx
        
        local filelist: dir "." files "COVID-19 Testing Metrics*.xlsx"
        di `"`filelist'"'
        
        foreach file of local filelist {
            import excel "`file'", firstrow case(lower) clear
            local outfile: subinstr local file ".xlsx" "", all
            sort date
            save "`outfile'", replace
        }
        local filelist_dta: dir "." files "covid-19 testing metrics*.dta"
        di `"`filelist_dta'"'
        
        *** 2b. Merging monthly dta files using Master Lab Testing Table
        ///        RATIONALE:     Using Update replace option for the merge, so that newer using file over-writes
        ///                    previous results available in master lab testing file for that same day.
        
        use "I:\EandE\5. Pipeline\Data\ID_Resp_2024_Master Lab Testing\MasterLabTesting.dta", clear
        sort date
        
        foreach file of local filelist_dta {
                merge 1:1 date using "`filelist_dta'", update replace
        }
        
        save "I:\EandE\5. Pipeline\Data\ID_Resp_2024_Master Lab Testing\MasterLabTesting.dta", replace
        clear

        Comment


        • #5
          Thank you. Now it's very clear what's going wrong:
          Code:
          foreach file of local filelist_dta {
                  merge 1:1 date using "`filelist_dta'", update replace
          }
          should be
          Code:
          foreach file of local filelist_dta {
                  merge 1:1 date using "`file'", update replace
          }

          Comment

          Working...
          X