Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Destring and generate hundreds of variables with an underscore in the name

    Hi

    I have a question on how to destring and generate approx.300 variables that have a similar name. I would prefer to run as little code as possible to reduce duplication so any help is appreciated.

    I need to destring a variable and save it as new variable with three additional letters in original variable name. The three letters are included after the first two letters but before the variable number. The number is either a single number or in some cases a number with an _letter/number.

    In the following example I have a list of variables (aa*) that I need to destring and save as a new variable (aaide*). The foreach command runs all variables that do not have an underscore (accounting for 3 and 13 which do not exist). A program is the only way I can figure out how to run this process on the variables with an underscore after the number. This seems really inefficient and not ideal.


    foreach i of num 1/18{
    if inlist(`i',3,13) continue
    clonevar aaide`i'=aa`i'
    destring aaide`i', replace
    }


    capture program drop num
    program define num
    destring `1', generate (aaide`2')
    end

    num aa5_t 5_t
    num aa11_1 11_1
    num aa11_2 11_2
    num aa11_3 11_3
    num aa12_1 12_1
    .
    .
    .

    num aa18_11 18_11
    num aa18_t 18_t

    .
    .
    .
    etc



    I am currently using version 15.1 (upgrading soon to version 16).

    Any assistance is greatly appreciated!

    Thanks in advance for your time
    Last edited by Jen Walker; 04 Jul 2019, 00:46.

  • #2
    Here's one way:

    Code:
    * some example data
    clear
    set obs 5
    foreach v in aa5_t aa11_1 aa11_2 aa11_3 aa12_1 aa18_11 aa18_t {
    gen `v'= int(runiform()*100)
    tostring `v', replace
    }
    * add some other variables that do not start with aa
    gen bb= int(runiform()*100)
    
    * what you want
    * put stringvars in a tempfile
    gen n= _n
    tempfile stringvars
    save `stringvars'
    * rename and destring
    destring, replace
    ren aa* aaide*
    * put stringavrs back in
    merge 1:1 n using `stringvars'
    drop _merge
    Or actually, now I think about it, yo were already really close with your own loop. That would be shorter.
    Code:
    foreach i of varlist aa*{
    clonevar aaide`i'=`i'
    destring aaide`i', replace
    ren aaideaa* aaide*
    }
    Last edited by Jorrit Gosens; 04 Jul 2019, 01:25.

    Comment


    • #3
      #2 Jorrit Gosens I guess that you want the last rename command to be outside your loop.

      Comment


      • #4
        Yes. It works, but it is not necessary to run with every iteration of the loop.

        Comment


        • #5
          Thanks heaps for your response! It works perfect.

          Comment


          • #6
            I’m not sure if this requires a new post but, for ease, my above question was a reduced version of the problem. I actually have 14 different study domains each domain with a different 3 letter identifier (ide, lan, med etc.). I used Jorrit’s code where there were no missing variables in the varlist, but in some cases I had a varlist with a number of missing variables. To get around this I used the below code to exclude the variables from the varlist.

            Code:
            unab destri:[list]
            unab no_destri: [vars]
            
            foreach i of varlist `:list destri-no_destri{
                            clone x`i’=`i’
                            destring x`i’, replace
            }
            
            rename ……
            So at this point I have 4 foreach commands suggested by Jorrit and 10 foreach commands similar to above. I would like to embedded these into a program to reduce the duplication of code (code below).

            Code:
            capture program drop destr
            program define destr
            
            unab destri: `1'
            unab no_destri: `2'
            
            foreach i of varlist `:list destri-no_destri'{
                            clonevar `3'`i'=`i'
                            destring `3'`i', replace
            }
            rename `4'* `5'*
            
            
            end
            
            destr aa1-aa18 "aa2_t-aa4_t aa7_t aa13_t" ide ideaa aaide
            destr aa19-aa24 aa21_t1-aa21_t7 lan lanaa aalan
            .
            .etc
            However, 4 of the foreach commands have a missing position `2’? Can I include these in the program by indicating to Stata that `2’ is missing so the varlist = `1’ (`1’-missing=`1’)? I have tried a few different options such as leaving the position blank, using “ “ and each time I received an error message ‘too few variables specified’.

            Thanks in advance

            Comment


            • #7
              This is a little abstract. An example of your data, with actual variable names would be useful. Include 2 or 3 'domain names' so we can understand the logic here.

              the way I understand it now, you could run e.g.,
              Code:
              * some example data
              clear
              set obs 5
              foreach v in aa5_t aa11_1 aa11_2 aa11_3 aa12_1 aa18_11 aa18_t bb5_t bb11_1 bb11_2 cc11_3 cc12_1{
              gen `v'= int(runiform()*100)
              tostring `v', replace
              }
              Code:
              foreach item in aa bb cc{
                  foreach i of varlist `item'*{
                  clonevar `item'NEW`i'=`i'
                  destring `item'NEW`i', replace
                  }
              }
              ren aaNEW* aaide*
              ren bbNEW* bblan*
              ren ccNEW* ccmed*
              The creation of a varlist with `item'* would ensure no 'missing' variables are included.
              Alternatively, your own code could be helped by including capture before some of the lines. I am no so sure where though, without a data example to run your code on.
              Edit: and when you do include a data example, please use dataex to create one. Exclude observations or variables to make a useful, self-contained example. Create some random data in case of privacy concerns.
              Last edited by Jorrit Gosens; 05 Jul 2019, 02:24.

              Comment


              • #8
                Hi Jorrit

                Thanks for your response. I appreciate there are many different ways to code but as I provided zero background information I appreciate I have confused the issue. Dismissing my code my question really is: is there a way I can tell Stata to treat a missing positional parameter in a program as a null. In SAS positional parameters are separated by a comma so you can leave a space between commas and the position is treated as a null ("If at invocation you do not supply a value for a positional parameter, the macro facility assigns a null value to that parameter.”) and I was wondering if Stata has something similar. Moving to Stata from SAS this issue is not a one off for me.

                As a work around in Stata I create a placeholder variable and drop it at the end of the program. So I created a blank placeholder string variable (PH) and I insert this in the position where I don't have a parameter but surely this is not the most efficient way of running things?


                Thanks again for your time and willingness to assist. It might be best if I create a positional parameter post with a more broad question to find out how I can tell Stata that if there is nothing in that position so like SAS it treats it like a null.

                Comment


                • #9
                  Still quite abstract, but I think I have an idea what you'd want.
                  I don't think there is such a thing in Stata, however. Not 100% sure.

                  The problem for Stata here is only that the line
                  Code:
                  foreach i of varlist `:list destri-no_destri'{
                  would be evaluated as
                  Code:
                  foreach i of varlist something-{
                  and Stata will complain about the line ending in a -.
                  A solution would then be to include a line of code that only includes the - in the line foreach if and when there are further variables in your `no_destri'
                  I hope that is enough of a hint, as I can't really help make this less abstract without an example.

                  I would still think, however, that all this could be prevented most usually with either:
                  - a wildcard, *, to tell Stata to just take all vars that assume a certain pattern.
                  - a capture before the line of code, so that Stata continues a piece of code regardless if it encounters an error (for example missing variable).
                  - a line of code that first figures out what variables exist and therefore what the last of your list of variables should be

                  Comment


                  • #10
                    Do you have 14 separate domain files with identical naming of variables?

                    The following example assume each domain (ide lan med) variables are in separate files. Each domain file is loaded, and string variables matching a regex pattern is converted to new numeric variables with new names including "domain-name".

                    need to destring a variable and save it as new variable with three additional letters in original variable name. The three letters are included after the first two letters but before the variable number. The number is either a single number or in some cases a number with an _letter/number.
                    In the example below the following regex is used '"^\p{L}{2}[0-9_]" this can be made more precise if necessary. The code below make temporary files for domains and all code should be run simultaneously:
                    Code:
                    ********************************************************************************
                    * make example data
                    ********************************************************************************
                    version 14
                    
                    local domains ide lan med
                    
                    quietly foreach domain of local domains {
                    
                        clear
                        set obs 5
                        tempfile `domain'
                        
                        foreach v in aa99 aa5_t ab11_1 ac11_2 AB11_3 CD12_1 not_of_interest {
                    
                            gen `v'= int(runiform()*100)
                            tostring `v', replace
                        }
                        
                        save "`domain'" , replace
                    } 
                    
                    ********************************************************************************
                    * read domain files - generate numeric vars with new name
                    ********************************************************************************
                    
                    quietly foreach domain of local domains {
                    
                        use "`domain'" , clear
                    
                        foreach v of varlist * {
                    
                            capture confirm string var `v'
                            
                            if ( _rc == 0 ) {
                            
                                local vnamepattern "^\p{L}{2}[0-9_]"
                            
                                if ( ustrregexm("`v'", "`vnamepattern'") == 1 ) {    
                                
                                    local newname = substr("`v'", 1, 2) + "`domain'" + substr("`v'", 3, .) 
                                    generate double `newname' = real(`v')
                                } 
                            }
                        }
                        
                        compress
                        noisily summarize *`domain'*  /* keep *`domain'* save etc.*/
                    }
                    
                    exit
                    Code:
                    . quietly foreach domain of local domains {
                    
                        Variable |        Obs        Mean    Std. Dev.       Min        Max
                    -------------+---------------------------------------------------------
                         aaide99 |          5        55.4    28.97067         11         91
                        aaide5_t |          5        46.2    28.65659          8         76
                       abide11_1 |          5        48.8     27.4809         15         79
                       acide11_2 |          5          60    26.99074         27         98
                       ABide11_3 |          5          58    28.26659         33         97
                    -------------+---------------------------------------------------------
                       CDide12_1 |          5          74    20.98809         40         91
                    
                        Variable |        Obs        Mean    Std. Dev.       Min        Max
                    -------------+---------------------------------------------------------
                         aalan99 |          5        53.2    26.40455         15         88
                        aalan5_t |          5        30.4    18.20165          3         53
                       ablan11_1 |          5          48    18.11077         29         75
                       aclan11_2 |          5          62    36.11786          8         93
                       ABlan11_3 |          5        35.2    22.86263         20         75
                    -------------+---------------------------------------------------------
                       CDlan12_1 |          5        52.8    22.12917         19         80
                    
                        Variable |        Obs        Mean    Std. Dev.       Min        Max
                    -------------+---------------------------------------------------------
                         aamed99 |          5        66.8    28.75239         20         96
                        aamed5_t |          5        35.2     24.4479          4         69
                       abmed11_1 |          5          93    5.787918         84         99
                       acmed11_2 |          5        43.6    32.79177         12         82
                       ABmed11_3 |          5        48.6      28.369          8         82
                    -------------+---------------------------------------------------------
                       CDmed12_1 |          5        54.6    29.09983          6         78

                    Comment


                    • #11
                      Thanks Bjarte for the suggestion.

                      Comment

                      Working...
                      X