Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • ICD 10 numlist function

    Hi there, I am just going over a code given to me by my colleagues in order to learn.
    Currently new to macros and loops.

    Could you kindly explain to me this code

    gen mst=0
    foreach num of numlist 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 {
    replace mst=1 if dg_`num'>="C77" & dg_`num'<="C79"
    }



    Line 1: Generates a new variable called mst = 0
    Line 2: local Macro created called 'num'
    Line 2: numlist - refers to the numbers consisting of +/- the combination of 1-20 :

    1. Question --> Do the variables in numlist have to be numerical variables ?
    2. What happens if they are stored as string eg Z14, Z15? Will it still work? Does numlist take into conisderation if there is a letter stored before the numbers eg Z14
    3. What if the values consist of the value 300 will the macro (in bold) still work?

    Line 3: replacing mst = 1 if the macro 'num' consists of values called C77 to C79

    Apologies, just trying to practice with macros and loops. The help numlist on Stata isn't very helpful .

  • #2
    1. Question --> Do the variables in numlist have to be numerical variables ?
    The things in numlist are not variables. They are numbers, and numbers are the only possibilities in this context. In this case, the numbers will be used to refer to variables with names dg_1 through dg_20, but that is irrelevant. The contents of the numlist itself must be numbers, and only numbers.

    2. What happens if they are stored as string eg Z14, Z15? Will it still work? Does numlist take into conisderation if there is a letter stored before the numbers eg Z14
    In fact, the variables dg_1 through dg_20 must be string variables. This has nothing whatsoever to do with the use of -foreach num of numlist...- It's because of the code inside the loop. The construction -...if dg_`num'>="C77" & dg_`num'<="C79"- requires that the dg_`num' variable be a string because you are comparing it to strings like "C77" and "C79". If you try this code and one of the dg_* variables is actually numeric, you will get a "type mismatch" error message.

    3. What if the values consist of the value 300 will the macro (in bold) still work?
    I don't know what you are asking here. What if the values of what consist of the value 300. You can have the value 300 in the numlist and the -foreach...- part will be perfectly happy with that. It may cause a problem later, because if there is no variable called dg_300, then you will get an error message saying variable dg_300 not found. If you are asking whether 300 can actually be a value of one of the dg_* variables, the answer is no. Because you are comparing the dg_* variables to strings, the contents of those variables must be strings. Now, you can have a value of one of those variables be "300" (note the quotes around 300), but not the number 300.

    Added: By the way, although the code you show in #1 will work, it can be streamlined a bit. Assuming that the variables dg_1 through dg_20 are consecutive variables in the data set, the same thing can be accomplished with:
    Code:
    gen mst = 0
    foreach v of varlist dg_1-dg_20 {
        replace mst = 1 if inlist(`v', "C77", "C79")
    }
    Last edited by Clyde Schechter; 30 Aug 2022, 09:50.

    Comment


    • #3
      Very useful thanks for explaining

      Thanks for this




      dg_`num'




      Basically the loop will cycle through variables

      dg_1

      dg_2

      dg_3




      And so forth up until dg_20




      **Looking for a string value between C77 to C79.




      **That is replacing the value 1 if it is C77 or C78 or C79.




      Will the > and < function work as these are string variables **




      The problem with inlist is that it can only take 10 or 15 arguments….








      Last edited by Denise Vella; 30 Aug 2022, 10:30.

      Comment


      • #4
        Will the > and < function work as these are string variables **
        Yes, with a caveat. The < and > operators express the dictionary sort order of strings, with the understanding that the order is based on the numerical values that represent the characters internally. In Stata, those numerical values may be based on ASCII, or Unicode. ASCII encoding sorts upper case letters before lower case. So "A34" < "a34." If what you are working with are ICD-10 codes, you probably don't want that, so you want to make sure you convert everything to upper case (or everything to lower case) to avoid problems like that. Actually, better still, use the -icd10 clean- command to make sure that everything is in a uniform, standardized format. See -help icd10- for more details.

        Unicode extends ASCII to cover characters other than the letters of the Roman alphabet, digits, and punctuation (plus certain "printer control characters.") Unicode is a more complicated system, but it, too may sort in ways that do not correspond to conventional alphabetical order. If you are only working with ICD-10 codes, you don't have to think about Unicode and its issues.

        The problem with inlist is that it can only take 10 or 15 arguments….
        That is true when, as here, we are talking about string variables. For numeric variables, the limit on the number of arguments is much larger. If you are dealing with a large enough set of target codes that you want to mark off in the variable mst that it is not possible to use -inlist()-, you can string several -inlist()-s together with | to accomplish the same thing. It's a lot less typing out that many == expressions. If the list of codes that you are trying to mark off in the variable mst is so long that even that is impractical, there are still other approaches.

        Comment


        • #5
          Thanks Clyde, I'm only practicing .

          With regards to using inlist - the code I thought of using is

          forvalues p = 4/5 {
          generate dgx`p' = 0
          replace dgx`p' = 1 if inlist(substr(`p',1,2) == "C7"

          This would be
          Line 1 : For variables p labelled dg4 and dg5
          Line 2: A new variable dgx1 and dx2 will be generated = 0
          Line 3: For each new variable dgx1 and dgx2 stata will screen the data dg4 and dg5 and if the data contains anything that starts with C7 as the first 2 characters STATA will replace this as 1 in the respective dgx columns

          I have tried this code with the above intention but Stata responds as :

          too few ')' or ']'
          r132

          Comment


          • #6
            Look at the help of the -inlist()- function to see what the syntax is. You need to specify arguments and the closing parenthesis.

            Code:
            help inlist
            ps. You appear to be using multiple aliases: https://www.statalist.org/forums/for...679888-numlist

            Comment


            • #7
              Thanks for your help, I've already checked it - many thanks
              I did include the ) - that's my fault as I am currently using 2 computers one of which doesn't have stata


              forvalues p = 4/5 {
              generate dgx`p' = 0
              replace dgx`p' = 1 if inlist(substr(Diagnosis`p',1,2), "C7")
              }


              invalid replace - r198

              My explanation for the above code would be
              Line 1 : For variables p labelled dg4 and dg5
              Line 2: A new variable dgx1 and dx2 will be generated = 0
              Line 3: For each new variable dgx1 and dgx2 stata will screen the data dg4 and dg5 and if the data contains anything that starts with C7 as the first 2 characters STATA will replace this as 1 in the respective dgx columns
              Last edited by Denise Vella; 31 Aug 2022, 05:37.

              Comment


              • #8
                Of note, using the inlist command separately ie. not with a macro (post 4) it works

                gen mst = 0
                replace mst = 1 if inlist(substr(dg4,1,2), "C7")


                Therefore in #post4 the replace command is not working for me, can anyone explain why? All my syntax seems correct.......

                I can't find a reply anywhere....
                Last edited by Denise Vella; 31 Aug 2022, 05:30.

                Comment


                • #9
                  I just wanted to say, I rewrote everything again - and the code works
                  Yes i wrote in a do file and I have always highlighted from the start to the finish.
                  I don't understand what I did differently from Post 4 to this post...Perhaps someone can enlighten me to avoid making this mistake again of 'invalid replace'

                  forvalues p = 4/5 {
                  2. generate Dgx`p' = 0
                  3. replace Dgx`p' = 1 if inlist(substr(Diagnosis`p',1,2),"C7")
                  4. }
                  (2 real changes made)
                  (1 real change made)

                  .
                  end of do-file




                  Comment


                  • #10
                    You can add

                    Code:
                    set trace on

                    just before the loop that results in the error. Then you will get more information about the error. Apart from the capitalizations, the codes look the same.
                    Last edited by Andrew Musau; 31 Aug 2022, 08:59.

                    Comment


                    • #11
                      I'm glad you got your code working in #9. If the earlier version was copy/pasted from here, or from other online sources, it may be that it was contaminated with invisible "control characters" that often are used in internet pages. They don't show up in listings, but Stata (or other software) "sees" them internally. So if there was such a character immediately preceding, or following, or embedded within -replace-, then Stata would fail to recognize it as actually being -replace-. This comes up from time to time, and the solution, as you discovered, is to delete the command and manually retype it.

                      One aside on the code in #9. One often sees code constructed like this (and yours in #9) is an example:
                      Code:
                      gen new_variable = 0
                      replace new_variable = 1 if whatever_condition
                      While this is perfectly workable, it can be streamlined to:
                      Code:
                      gen new_variable = whatever_condition
                      The shorter construction is less typing, fewer opportunities to introduce a mistake, and easier to read and understand. The only time you can't use it is if the -replace- part is inside a loop that would result in trying to -generate- the same variable on each iteration. Your code in #9 doesn't have this problem because each time through the loop creates a different variable. So you can use the one-line syntax.

                      Comment

                      Working...
                      X