Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using regex to extract new varnames from filenames?

    I am trying to figure out the best way to create new variable names from filenames. The files have the same naming pattern but not the same number of sub characters within a pattern - which is where I am hitting a snag.
    As an example, the filenames are
    1. prev_qrtr_allnatlb_census2001.dta
    2. prev_qrtr_metro_census2001.dta
    3. prev_qrtr_all_census2001.dta
    and so on.
    Each file has four main variables: credit, deposit, office, and cflag2001
    Since I want to ultimately merge all the files, I am trying to create a loop that would allow me to change the varnames to
    1. credit_allnatlb, deposit_allnatb, office_allnatb
    2. credit_metro, deposit_metro, office_metro
    3. credit_all, deposit_all, office_all

    I have displayed the do-file below. I am trying to use regex commands but obviously, there is something wrong with the syntax. Any advice would be much appreciated.

    local filelist : dir "C:/Research/RBI data/Statement 4A/" files "*.dta"
    foreach f of local filelist {
    use "`f'"
    destring credit* deposit*, replace ignore(`","')
    reshape long office deposit credit, i( statedist) j(_quarter) string
    gen quarter = quarterly(_quarter, "YQ")
    format quarter %tq
    drop _quarter
    *filename pattern:
    local j= regexs(3) if regexm("`f'","([a-zA-Z]+)[_]([a-zA-Z]+)[_]([a-zA-Z]+)[_]([a-zA-Z]+)([0-9]+)[.]([a-zA-Z]+)")
    rename credit credit_"`j'"
    rename deposit deposit_"`j'"
    rename office office_"`j'"
    rename cflag2001 cflag2001_"`j'"
    save "C:/Research/data/inter/`f'", replace
    }


    Thanks.

  • #2
    Why "obviously"? What is the snag you are hitting?

    At a glance the most evident problem is the use of " " in the rename commands., but you've not said where the code is failing and how, and we have nothing to test this on.

    I haven't tried to decipher your call to the regexm() function (not a command).

    I'd turn this around. You know in advance that there are three kinds of file, so you can use that information. You don't have to extract what you already know.

    This is a sketch of a different approach. I can't tell e.g. whether you were working in the same directory as the data files, so I've made that explicit.


    Code:
    local where "C:/Research/RBI data/Statement 4A/"
    cd "`where' "
    foreach x in allnatlb metro all {
        local filelist : dir .  files "prev_qrtr_`x'*.dta"
        foreach f of local filelist {
            use "`f'"
            destring credit* deposit*, replace ignore(",")
            reshape long office deposit credit, i(statedist) j(_quarter) string
            gen quarter = quarterly(_quarter, "YQ")
            format quarter %tq
            drop _quarter
            rename credit credit_`x'
            rename deposit deposit_`x'
            rename office office_`x'
            rename cflag2001 cflag2001_`x'
            save "C:/Research/data/inter/`f'", replace
        }
    }
    Last edited by Nick Cox; 18 Oct 2017, 11:31.

    Comment


    • #3
      Thanks. That worked beautifully. I was hitting a snag at the line

      local j=....
      and Stata was returning the error
      if not allowed
      r(101);

      Thanks for pointing out the renaming error too.

      Comment


      • #4
        Yes; stupid of me not to have seen that

        Code:
        local j= regexs(3) if regexm("`f'","([a-zA-Z]+)[_]([a-zA-Z]+)[_]([a-zA-Z]+)[_]([a-zA-Z]+)([0-9]+)[.]([a-zA-Z]+)")
        should have been at a minimum


        Code:
        if regexm("`f'","([a-zA-Z]+)[_]([a-zA-Z]+)[_]([a-zA-Z]+)[_]([a-zA-Z]+)([0-9]+)[.]([a-zA-Z]+)")  local j= regexs(3)

        Comment

        Working...
        X