Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Replacing all values of "-99" with "" or .n

    Based on info from a previous thread, I created the following code to replace all -99s with "". This code works, but I wanted to know if there's a more elegant way to do this, given that this is ten lines of code.

    Code:
    *puts the variables into r(varlist)
    ds, has(type byte int long)
    local varlist "`r(varlist)'"
    foreach v of local varlist {
        recode `v' (-99 = .n)
    }
    ds, has(type string)
    local varlist "`r(varlist)'"
    foreach v of local varlist {
        replace `v'="" if `v'=="-99"
    }

  • #2
    You can simplify this down to five lines:

    Code:
    ds, has(type numeric)
    mvdecode `r(varlist)', mv(-99 = .n)
    
    ds, has(type string)
    foreach v of varlist `r(varlist)' {
        replace `v' = "" if `v' == "-99"
    }
    Note: I replaced -has(type byte int long)- by -has(type numeric)-. This is not exactly the same thing: my code will also process anything stored as a double. It isn't clear to me why you want to exclude doubles from your treatment of missing values--so I assumed you just don't have any doubles in your data set or didn't think about the issue. But if you do have doubles and you don't want to convert those to .n, then retain the original -has(type byte int long)-.

    Comment


    • #3
      Tiny suggestion for #2: mvdecode simply ignores string variables, so there is no real need to identify the subset of numeric variables beforehand.

      At any rate, here is some marginally shorter code:

      Code:
      foreach v of varlist _all {
          if substr("`:type `v''", 1, 3) != "str" mvdecode `v', mv(-99 = .n)
          else replace `v' = "" if `v' == "-99"
      }
      Last edited by Hemanshu Kumar; 22 Apr 2025, 11:02.

      Comment


      • #4
        While #3 is only marginally shorter at first glance, ds carries quite some overhead, so in practice,
        #3 is likely a couple of hundred lines shorter.

        You can be even more direct and clear:
        Code:
        foreach v of varlist _all {
            if substr("`:type `v''", 1, 3) != "str" replace `v'= .n if `v' == -99
            else replace `v' = "" if `v' == "-99"
        }
        This avoids the additional overhead of mvdecode.
        Last edited by daniel klein; 23 Apr 2025, 00:35.

        Comment


        • #5
          We're now talking style not speed, but

          Code:
          if substr("`: type `v''", 1, 3) == "str" replace `v' = "" if `v' == "-99"
          else replace `v' = .n if `v' == -99
          is surely more direct, following the logic

          if the variable is string do the string replacement

          else do the numeric replacement

          rather than

          if the variable is not string do the numeric replacement

          else do the string replacement

          Comment


          • #6
            Regarding this bit:
            Code:
             
             if substr("`: type `v''", 1, 3)
            It might be more elegant in Stata 21 to have a shorter way to check if a variable is a string. I don't know if that's possible, but thought I'd suggest it. Thank you all for your help here!

            Comment


            • #7

              Code:
              capture confirm str variable `v' 
              
              if _rc == 0 replace `v' = "" if `v' == "-99" 
               else replace `v' = .n if `v' == -99


              is another longstanding way to do it.

              Comment


              • #8
                Still, a Stata variation of Mata's st_isstring() is a reasonable request, given the quite common task; implementation should be rather trivial.

                By the way, watch out for alias variables. Those might "mask" as string or numeric, but you cannot replace them.

                Comment

                Working...
                X