Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • -strpos- failing to respond as expected

    I've got a bundle of about 1200 Excel files that should have uniform column headers (and thus variable names upon import), but don't. That's too many for me to manually review and fix, so I'd like to try to semi-automate this process.

    My pseudocode is something like this:

    1) Read in first Excel file, having verified that the variables are as I want them to be.
    2) Save as my destination .dta file
    3) Create a local macro ("approved_vars") containing the varlist (*)
    4) Read next Excel file into memory
    5) Tokenize the varlist (of Excel file #2) and loop through the tokens, checking each against the approved varlist
    6) Where a variable from Excel file 2...n is missing from the approved varlist, pause and report back both the erroneous variable and the source file
    7) I can then investigate the erroneous variable and source file and make modifications either to my subsequent cleaning code or to the file itself
    8) Resume


    The problem is, I'm trying to use -strpos- to verify the presence or absence of the token within the approved_vars local macro, and it's not behaving as expected. My code is below. (I have elided the reporting of the filename because right now I can't figure out why the strpos is failing.)

    Code:
    local approved_vars "num_att_reg name_full num_phone_topay name_topay name_num_lookup name_num_match_conf sex zone_name county_name school_name attendee_title presence_day1am presence_day1pm presence_day2am presence_day2pm reimbursement_per_sess reimbursement_travel reimbursement_tot paylater source"
    local current_vars "num_att_reg name_full num_phone motorcar houseboat" // These are nonsense variables to test this loop
    local token_ct = 5
    tokenize "`current_vars'"
    foreach x of num 1/`token_ct'{
    di as error "``x''"
    if strpos("``x''","`approved_vars'")==0 {
        di as input "Searching for [``x''] in [`approved_vars'] was unsuccessful."
        pause "The variable being tested is [``x'']: How will you rename them?"
        }
        else if strpos("``x''","`approved_vars'")!=0 di as result "[``x''] was found in [`approved_vars']."
    }
    It just occurred to me that the length of the local macro approved_vars might be too long - it's 279 characters - but I just ran it again with a truncated version of the macro that only had 200 chars. and it still failed in the same way. Screenshot of the output is below:
    Click image for larger version

Name:	Screenshot of strpos problem.png
Views:	1
Size:	32.0 KB
ID:	1376123


  • #2
    There could be other problems but I guess that you have your arguments the wrong way round. Compare

    Code:
    . di strpos("frog", "frog toad newt")
    0
    
    . di strpos("frog toad newt", "frog")
    1
    strpos() returns the position of the second argument within the first, or 0 if it is not included

    Comment


    • #3
      Note that the loop is not really needed here. Use macro lists instead.

      Code:
      local ok a b c d
      local test a f c d
      
      local notok : list test - ok
      display "`notok'"
      Best
      Daniel

      Comment


      • #4
        Nick: Thank you so much. I'm a total yutz. I've probably used -strpos- a million times, and just couldn't get enough distance from this particular set of code to see that rookie mistake. YIkes.

        Comment


        • #5
          I don't document most of my silly errors. Only yesterday I found } where ) should have been in a program fortunately not yet made public.

          Comment


          • #6
            daniel klein - thanks, btw, for introducing me to macro lists. I hadn't even realized they existed.

            A question, though - when you're trying to run a list function, do the same character limits apply to the dyad of lists as to each individual one? So, for instance, I'm trying to do
            Code:
            local comparison1: list `current_vars' - `approved_vars1'
            .

            I already split the `approved_vars' macro from my original formulation into two smaller ones (*1 and *2) so I wouldn't butt up against the 270(ish?) character limit for an individual local macro...but when I run
            Code:
            local comparison1 : list `current_vars' - `approved_vars1'
            I'm now getting a syntax error (and the below output, per -trace-):
            Code:
            = local comparison1 : list num_att_reg name_full num_phone_topay name_topay name_num_lookup name_num_match_conf sex zone_name c
            > ounty_name school_name attendee_title presence_day1am presence_day1pm presence_day2am presence_day2pm reimbursement_per_sess
            > reimbursement_travel reimbursement_tot paylater source - num_att_reg name_full num_phone_topay name_topay name_num_lookup nam
            > e_num_match_conf sex zone_name county_name school_name
            invalid syntax

            Comment


            • #7
              You dont need the single quotes, macros are assumed in the list extended fcn. so:

              local comparison1 : list current_vars - approved_vars1



              eg
              Code:
              loc current_vars num_att_reg name_full num_phone_topay name_topay name_num_lookup name_num_match_conf sex zone_name county_name school_name attendee_title presence_day1am presence_day1pm presence_day2am presence_day2pm reimbursement_per_sess reimbursement_travel reimbursement_tot paylater source 
              
              loc  approved_vars1 num_att_reg name_full num_phone_topay name_topay name_num_lookup name_num_match_conf sex zone_name county_name school_name
              
              local comparison1 : list current_vars - approved_vars1
              di `"`comparison1'"'
              Last edited by eric_a_booth; 28 Feb 2017, 12:27.
              Eric A. Booth | Senior Director of Research | Far Harbor | Austin TX

              Comment


              • #8
                Eric's statement can be strengthened. You not only don't need the single quotes, local macro references are out of order there unless the macros referenced in turn contain local macro names (which would be intricate programming; I don't recall seeing it).



                Comment


                • #9
                  I guess the question has been answered already.

                  I do not fully follow the limits discussion. The limit for characters in a local macro is at least 165,200 (for Stata IC). Given that a variable name can have up to 32 characters and adding spaces your list could have as many as 165,200/33 = 5006 variable names.

                  By the way, limits are documented in

                  Code:
                  help limits
                  Best
                  Daniel

                  Comment


                  • #10
                    daniel klein - w.r.t. limits, it looks like a classic case of my half-understanding a couple of different things and then conflating them. eric_a_booth and Nick Cox - likewise, thanks for the feedback on the local macro syntax/style/usage. Thanks, all, for setting me straight.

                    Comment

                    Working...
                    X