Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Replace missing values that are indicated with an asterisk (*)

    I have a problem regarding an Excel file that has missing values that are indicated with an asterisk (*).
    If I use search and replace in Excel, Excel thinks that there are asterisks (*) in all the cells. So, this is not the way to get rid of the asterisks (*). When I import the Excel file in Stata, Stata thinks that this variable is string. But it is not possible to destring the variable, because of the asterisk (*). I tried to encode the variable, and then the data becomes 'blue'. The asterisks (*) remain in the data. Stata knows the values (like mean, etc.), but cannot recognize the missing values.
    Can somebody help me with the missing values that are indicated with an asterisk (*) ? How can I get rid of these asterisks (*) such that Stata knows that these asterisks (*) are missing values?

    Thanks in advance.

  • #2
    Something like

    Code:
    replace var = " " if var=="*"
    and then try to destring.

    If this does not work, provide a data sample with -dataex-.

    Comment


    • #3
      destring has options, and replace ignore("(*)") should fit your case.


      encode only seems to work here, as e.g. a string value like (*) will be encoded to some number and (*) will be visible as a value label, but results of summarize and most other commands will be garbage.

      To see this let's look at a minimal example.

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input str3 problem
      "42" 
      "666"
      "(*)"
      end
      
      encode problem, gen(solution)
      
      . l
      
           +--------------------+
           | problem   solution |
           |--------------------|
        1. |      42         42 |
        2. |     666        666 |
        3. |     (*)        (*) |
           +--------------------+
      
      . list, nola
      
           +--------------------+
           | problem   solution |
           |--------------------|
        1. |      42          2 |
        2. |     666          3 |
        3. |     (*)          1 |
           +--------------------+
      
      . summarize solution
      
          Variable |        Obs        Mean    Std. Dev.       Min        Max
      -------------+---------------------------------------------------------
          solution |          3           2           1          1          3
      Code:
      
      


      The way
      encode works by default is to sort strings into dictionary order and then map distinct values to integers 1 up. The result is only ever acceptable if your strings denote categories such as "female" "male" (and even then you are likely to want something else).

      As the help for
      encode signals

      Do not use
      encode if varname contains numbers that merely happen to be stored as strings; instead, use generate newvar = real(varname) or destring; see
      real() or [D] destring.




      Comment


      • #4
        Joro Kolev and Nick Cox, thank you so much. Your quick response is greatly appreciated. And it worked!!

        Comment

        Working...
        X