Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • loop varlist and strpos

    Good morning,

    I'm working with a data file including variables from icd_nd1 - icd_nd89. I want to write a loop, which generates a new var laestring=1, for each var that from the varlist icd_nd1 - icd_nd89 that includes "I260" or "I269", no matter which letters/numebers are following. Somehow my code does not work, does somebody have an idea? Thank you very much for your help!


    foreach var of varlist icd_nd1-icd_nd89 {
    gen laestring`var' = 1 if strpos(`var', "I260") | strpos(`var', "I269")
    }
    foreach var of varlist laestringicd_nd* {
    recode `var' . = 0
    }
    gen laestring = 0
    replace laestring = 1 if laestringicd_nd1-laestringicd_nd89 != 0
    drop laestringicd_nd1-laestringicd_nd89
    codebook laestring
    label variable laestring "0 = no LAE 1 = LAE"


    Example of the Data file

    icd_nd1 icd_nd2 icd_nd3 icd_nd4 icd_nd5 icd_nd6

    I1000 E780 G8203 D62 K638
    C499 H360 J4412
    J3800 E782 B962
    C679 R11 Z922
    E871 R11 I700

  • #2
    Welcome to Statalist.

    Here is code that seems to do what you want.
    Code:
    clear
    input str8 (icd_nd1 icd_nd2 icd_nd3 icd_nd4 icd_nd5 icd_nd6)
    I1000 E780 G8203 D62 K638
    C499 H360 J4412 
    J3800 I260 B962
    C679 I2699 Z922
    E871 R11 I700 
    end
    generate laestring = 0
    foreach var of varlist icd_nd1-icd_nd6 {
    replace  laestring = 1 if inlist(substr(`var',1,4), "I260", "I269")
    }
    list, clean noobs
    Code:
    . list, clean noobs
    
        icd_nd1   icd_nd2   icd_nd3   icd_nd4   icd_nd5   icd_nd6   laestr~g  
          I1000      E780     G8203       D62      K638                    0  
           C499      H360     J4412                                        0  
          J3800      I260      B962                                        1  
           C679     I2699      Z922                                        1  
           E871       R11      I700                                        0
    With that said, some advice to improve future posts.

    Please review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post. Note especially sections 9-12 on how to best pose your question.

    Section 12.1 is particularly pertinent

    12.1 What to say about your commands and your problem

    Say exactly what you typed and exactly what Stata typed (or did) in response. N.B. exactly!
    ...
    Never say just that something "doesn't work" or "didn't work", but explain precisely in what sense you didn't get what you wanted.
    as is Section 12.3 on the use of code delimiters [CODE] and [/CODE] to present code and output copied and pasted from the Do-file Editor or the Results window.

    The more you help others understand your problem, the more likely others are to be able to help you solve your problem.

    Comment


    • #3
      Thank your very much for your quick help and the advice on writing my posts!! One questions to your code.

      Code:
      clear 
       input str8 (icd_nd1 icd_nd2 icd_nd3 icd_nd4 icd_nd5 icd_nd6) I1000 E780 G8203 D62 K638 C499 H360 J4412  J3800 I260 B962 C679 I2699 Z922 E871 R11 I700  end

      My code contains about 5000 variables, not only the few I showed in the code above. How can I automatically search all variables, without listing all of them in my do file?

      Comment


      • #4
        Hi Frederik, I'm not sure if this will work for 5000 variables, but the comands

        Code:
        (whatever) icd_nd1 icd_nd2 icd_nd3 icd_nd4 icd_nd5 icd_nd6
        (whatever) icd_nd1 - icd_nd6
        get to the same result. You might be able to use this trick.

        Comment


        • #5
          Extra info on the topic:

          https://www.stata.com/support/faqs/d...ple-variables/

          Comment


          • #6
            I'll share the advice I gave earlier today on someone else's topic: See the output of help varlist to see other ways of constructing variable lists in Stata. One of them should work for you.

            Comment


            • #7
              Ok after some tries I found a solution:

              Code:
              ***LAE*****
              foreach var of varlist o_icd_nd1-o_icd_nd89 { 
              gen lae`var' = 1 if strpos(`var', "I26") 
              } 
              foreach var of varlist laeo_icd_nd* {
              recode `var' . = 0
              }
              gen lae = 0
              foreach var of varlist laeo_icd_nd* { 
              replace lae = 1 if `var' != 0
              } 
              drop laeo_icd_nd1-laeo_icd_nd89
              codebook lae
              label variable lae "0 = keine LAE 1 = LAE"
              This Code works, but it looks very complicated.
              If a var of varlist o_icd_nd1-o_icd_nd89 = I26 I tell stata to generate a new var "lae" = 1. Do you think that my solution ist the easiest way, our would there even be a shorter/easier code?

              Comment


              • #8
                It seems to me the following code would do what you want.
                Code:
                generate lae = 0
                foreach var of varlist o_icd_nd1-o_icd_nd89 {
                replace  lae = 1 if strpos(`var',"I26")
                }

                Comment


                • #9
                  Hello. I have a wide dataset with 40 vars that store phone numbers. But some observations have entered o instead of 0 hence are str# form while the others are byte. Thus I am unable to reshape without correcting for the ones which are str.

                  The example of data is with 4 str phone number vars (that I need to fix, in order to reshape to long) and 1 numeric phone number var (all correct, so could be easily reshaped long)

                  Code:
                  * Example generated by -dataex-. To install: ssc install dataex
                  clear
                  input str13 hh_phno_sms_19 str11(hh_phno_sms_21 hh_phno_sms_23 hh_phno_sms_24) double hh_phno_sms_6
                  "3464613163"    "3439683811"  ""            ""            3464609919
                  ""              ""            ""            ""                     .
                  "3415081651"    "3469312012"  ""            ""            3449905426
                  "3488450181"    "3485333967"  "3441581976"  "3488796674"  3431953835
                  "3459103337"    "3470890745"  "3429534622"  "3460523583"  3441895041
                  "3444991637"    ""            ""            ""            3444991637
                  "3459097018"    ""            ""            ""            3473839506
                  "3450937502"    "3429060382"  "3469855619"  ""            3466644565
                  "3499433221"    "3459277018"  "3439602591"  "3470898136"  3419661800
                  ""              "3441201542"  "3476026486"  "3463137810"  3420901575
                  "3447445646"    "3464618760"  ""            ""             439384373
                  "347893771"     "3453255504"  "3038049804"  ""            3468157006
                  ""              ""            ""            ""            3449890641
                  "3456720776"    "3479530544"  "3449761600"  "3418350817"  3421651361
                  "3499021498"    ""            ""            ""            3419823058
                  "3471952650"    "3469411085"  "3454263944"  ""            3405757046
                  ""              ""            ""            ""                     .
                  ""              ""            ""            ""                     .
                  "3439393083"    "3453436503"  ""            ""            3442258113
                  "33581809085"   "3455674886"  ""            ""             349123519
                  "3469459304"    "31596599000" "3459192090"  "3469464331"  3495850478
                  "03454741784,1" "3470974485"  ""            ""            3429581960
                  "343436608"     "3444141330"  "3456432511"  "3442250919"  3479093894
                  "34694659740"   "3068858840"  "3459421892"  ""            3479123433
                  "3366563733"    "3439627800"  "3018043686"  "3469459298"  3429666941
                  "O3445o63742"   "O341569o593" "O3462541812" "O3462541812" 3475130375
                  "3449685913"    ""            "3413110814"  "3480614595"  3456433842
                  ""              ""            ""            ""                     .
                  "3420440751"    "3159708577"  "3483943447"  ""            3449643150
                  "3434664530"    "3009343730"  "3449687612"  ""            3365184850
                  "3475350672"    "3406879688"  "3456128492"  "3369534990"  3469088996
                  "3439393101"    "348934894"   ""            ""            3095870730
                  "3429606035"    "3429606035"  "3439590041"  ""            3466578575
                  ""              "3463577231"  "3469408426"  "3469458620"  3408967390
                  "3456084476"    "3438986410"  "3439389514"  "3424485298"  3459277017
                  "3474435295"    ""            ""            ""            3456789515
                  "3462477669"    "3459452313"  "3419270601"  "3419510523"  3439425783
                  ""              ""            ""            ""            3449889277
                  ""              ""            ""            ""                     .
                  ""              ""            ""            ""            3414867635
                  end


                  So, I did the following.

                  1. First find the vars which are str vars

                  Code:
                  ds hh_phno_sms_* , has(type str#)
                  local strvars "`r(varlist)'"
                  list `strvars'


                  2. Now within this list of vars, I wish to find ones with o or O and then use that dummy and subinstr function to replace them with 0.

                  Code:
                  gen corrected = 0
                  foreach var of varlist `strvars' {
                  replace corrected=1 if strpos(`strvars' , "o")
                  replace corrected=1 if strpos(`strvars' , "O")
                  }
                  But I got the error

                  Code:
                  hh_phno_sms_19hh_phno_sms_21hh_phno_sms_23hh_phno_sms_24 invalid name
                  r(198);

                  Any help is useful!
                  Last edited by Priyoma Mustafi; 06 Jan 2023, 16:20.

                  Comment


                  • #10
                    The source of your error message is using -strpos(`strvars' , "o")-. It should be -strpos(`var' , "o")-, because you want to refer to the specific variable here, not the entire list of variables.

                    That said, you are generating a variable called corrected, but you are not actually correcting anything. Your variable corrected just points out the observations where something therein needs correction. For your overall purpose, you don't need such a variable. Just do the correction immediately:
                    Code:
                    ds hh_phno_sms_*,has(type string)
                    local strvars `r(varlist)'
                    
                    foreach v of varlist `strvars' {
                        replace `v' = subinstr(lower(`v'), "o", "0", .)
                    }
                    Note that by using -lower(`v')- instead of -`v'- as the first argument of -subinstr()-, you take care of both "o" and "O" in a single command.

                    Note also that you still will not be able to convert all of these phone numbers to numeric because in observation 22, hh_phno_sms_19 contains a comma.

                    The example of data is with 4 str phone number vars (that I need to reshape wide) and 1 numeric phone number var (which could be easily reshaped long)
                    I don't understand what you are talking about. The bold faced portions are confusing me. The data set is currently wide. You cannot reshape a single variable to long. -reshape- applies to a series of variables and converts them all into a single variable that "stacks" the values of the individual variables.

                    All of that said, I would not convert these phone numbers to numeric. It is not valid to perform any kind of arithmetic on phone numbers. And some phone numbers legitimately contain non-numeric characters such as + or #. Instead, I would convert the numeric ones to string. Then after that, I would reshape them all.

                    Added: The only benefit you would derive from converting the phone numbers to numeric variables (assuming you could somehow get around the issue of # and +) would be that you would save a little bit of memory. But it would only be a little bit, and not worth it unless your data set is so large that you are already pushing up against memory limits.
                    Last edited by Clyde Schechter; 06 Jan 2023, 16:35.

                    Comment


                    • #11
                      Thank you, I see the mistake with `var' too and the trick with -lower(`v')- instead of -`v'

                      ! I don't plan on doing arithmetic operations with phone numbers, but I wish to (1) identify who made such mistakes for comprehension checks. And (2) correcting them is important too, since these phone numbers will populate a database I am creating in which I can have valid legit phone numbers. + and # or country codes etc have already been dealt with using substr().

                      Regarding reshape, if I do -tostring- before and then reshaped, it could also work. But I would still have to do (1) and (2) either way.

                      Comment


                      • #12
                        Regarding reshape, if I do -tostring- before and then reshaped, it could also work. But I would still have to do (1) and (2) either way.
                        Yes, indeed!

                        Comment

                        Working...
                        X