Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Identify variables with identical prefixes

    Dear Statalisters,

    say I have the following variable names:
    Code:
    child_id_kp child_id_ka child_id_ng month_gh month_ka month_ng
    My broader goal is to combine information from those variables with the same prefix. More specifically, in the case of the example above, I want to create two variables
    Code:
    gen child_id = .
    gen month = .
    and afterwards iterate over all variables with the corresponding prefix to fill up the missings.
    For this to work, I need Stata to
    1. identify prefixes that occur multiple times
    2. store all variables with the corresponding prefix in a local
    I don't have any idea about 1. Regarding 2., I want to end up with
    Code:
    local child_id child_id_kp child_id_ka child_id_ng
    local month month_gh month_ka month_ng
    I'm unable to identify these prefixes manually given that the number of variables is large and the procedure has to be done for several datasets.

    Any hint is greatly appreciated!


    Last edited by Florian Renosch; 27 Apr 2020, 02:12.

  • #2
    It may be possible to identify prefixes if they are delimited by numbers or underscores. However, if the prefix is delimited by words or letters, then there is no straightforward way to distinguish between, for example, Stata16 and station27.

    Comment


    • #3
      Dear Andrew,

      thank you for your help! Do I understand correctly that you don't see any possibility to get the desired result?

      Best wishes
      Florian

      Comment


      • #4
        I don't see the precise difficulty here. Does this help?

        Code:
        unab child : child* 
        
        gen child_id = . 
        
        foreach v of local child { 
              replace child_id = `v' if missing(child_id) 
        }

        The trick there is to form the list in local macro child before child_id exists.

        Comment


        • #5
          Nick: Thanks for your answer!
          I should have been more precise in my initial post. Because the number of variables is large and the procedure has to be done for many datasets, I need it to be automated. I.e., I need Stata to automatically identify variables with the same prefix. There are simply too many to do it manually.

          Comment


          • #6
            You have to define "prefix" precisely. Perhaps you are regarding it as obvious that the prefix ends with an underscore. This is what Andrew Musau asked in #2 and I don't see a clear reply to his post.

            This works

            Code:
            local varlist child_id_kp child_id_ka child_id_ng month_gh month_ka month_ng  
            
            local prefixes  
            
            foreach v of local varlist {      
                 local this = substr("`v'", 1, strpos("`v'", "_") - 1)    
                 local prefixes `prefixes' `this'
            }  
            
            local prefixes : list uniq prefixes  
            
            di "`prefixes'"
            If that helps then you would start with

            Code:
            unab varlist : *
            Last edited by Nick Cox; 27 Apr 2020, 06:55.

            Comment


            • #7
              Andrew posed the crucial question: How exactly do you define a prefix?

              Do frog and foo have the same prefix? Do fox and foo?

              Best
              Daniel

              Comment


              • #8
                This is similar to Nick's solution in #6, assuming that prefixes are delimited by numbers or underscores. I also use Daniel's code from this link to view the contents of all macros as you ask for the prefixes to be attached to their relevant variables.

                Code:
                * Example generated by -dataex-. To install: ssc install dataex
                clear
                input float(child_id_kp sp1 sp2 child_id_ka child_id_ng month_gh month_ka month_ng sp99 k12 k13 kapp1)
                . . . . . . . . . . . .
                end
                
                foreach var of varlist *{
                   if ustrregexm("`var'", "([a-zA-Z]+)[0-9\_]"){
                      local prefixes "`prefixes' `=ustrregexs(1)'"
                   }
                }
                local uniqueprefixes: list uniq prefixes
                foreach prefix of local uniqueprefixes{
                   foreach var of varlist *{
                      if ustrregexm("`var'", "^`prefix'"){
                         local `prefix' "``prefix'' `var'"
                      }
                  }
                }
                mata : st_local("all_locals", invtokens(st_dir("local", "macro", "*")'))
                display "`all_locals'"
                Res.:

                Code:
                . display "`all_locals'"
                kapp k month sp child uniqueprefixes prefixes
                
                . di "`month'"
                 month_gh month_ka month_ng
                
                . di "`child'"
                 child_id_kp child_id_ka child_id_ng
                Problems will always arise if a prefix is not unique enough when identifying the related variables, e.g.,

                Code:
                . di "`k'"
                 k12 k13 kapp1
                A workaround is to treat prefixes delimited by numbers and underscores separately, so when you are identifying variables, you can be specific, i.e., pefix+number or prefix+ undescore. Finally, you should drop macros with a single element which I did not bother doing above.
                Last edited by Andrew Musau; 27 Apr 2020, 07:44.

                Comment


                • #9
                  Thanks a lot, Nick! The answer does the trick for me if I replace strpos by strrpos.
                  Indeed, it is unclear what prefix refers to in my previous posts. For the sake of completeness, the intended prefix of child_id_kp is child_id.

                  Comment


                  • #10
                    The code in #6 identifies prefixes as requested. If you are using strrpos() I guess that is for the suffix.

                    Comment

                    Working...
                    X